Skip to content

allow sorting keys on to_json and to_python by passing in sort_keys #1637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

aezomz
Copy link

@aezomz aezomz commented Feb 15, 2025

Hello Pydantic Team! This is my first time contributing to a Rust and Pyo3 related repo.
I am also new in Rust.
Do you think this PR will make sense?
Since I have been trying to do model_dump_json with sort keys too.

This feature should simulate the same as how we use json.dumps(data, sort_keys=True)

Will sort from:
        {
            'field_123': b'test_123',
            'field_b': 12,
            'field_a': b'test',
            'field_c': {'mango': 2, 'banana': 3, 'apple': 1},
            'field_n': [
                {'mango': 3, 'banana': 2, 'apple': 1},
                [{'mango': 3, 'banana': 2, 'apple': 1}, {'d': 3, 'b': 2, 'a': 1}],
                3,
            ],
            'field_d': [
                {'d': 3, 'b': 2, 'a': {'nested3': 3, 'nested1': 1, 'nested2': 2}},
                [[{'mango': 3, 'banana': 2, 'apple': 1}], {'d': 3, 'b': 2, 'a': 1}],
                3,
            ],
            'field_none': None,
        }
To:
    assert s.to_python(m, exclude_none=True, sort_keys=True) == snapshot(
        {
            'field_123': b'test_123',
            'field_a': b'test',
            'field_b': 12,
            'field_c': {'apple': 1, 'banana': 3, 'mango': 2},
            'field_n': [
                {'apple': 1, 'banana': 2, 'mango': 3},
                [{'apple': 1, 'banana': 2, 'mango': 3}, {'a': 1, 'b': 2, 'd': 3}],
                3,
            ],
            'field_d': [
                {'a': {'nested1': 1, 'nested2': 2, 'nested3': 3}, 'b': 2, 'd': 3},
                [[{'apple': 1, 'banana': 2, 'mango': 3}], {'a': 1, 'b': 2, 'd': 3}],
                3,
            ],
        }
    )

Take note that

  1. field_d is extra and still manage to sort
  2. sorting recursively for both defined schema and extras
  3. sorting including dictionary in array or nested array array

Please let me know if I miss out any other features that sort_keys=True is suppose to do!

Thanks!

Change Summary

allow sorting keys on to_json and to_python by passing in sort_keys

Related issue number

should fix pydantic/pydantic#7424
Might need to create another MR on Python repo though, need to check.

Checklist

  • Unit tests for the changes exist
  • Documentation reflects the changes where applicable
  • Pydantic tests pass with this pydantic-core (except for expected changes)
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @davidhewitt

Copy link

codecov bot commented Feb 15, 2025

Codecov Report

Attention: Patch coverage is 85.77075% with 36 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/serializers/fields.rs 84.91% 35 Missing ⚠️
python/pydantic_core/core_schema.py 50.00% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Copy link

codspeed-hq bot commented Feb 15, 2025

CodSpeed Performance Report

Merging #1637 will not alter performance

Comparing aezomz:allow_model_dump_sort_keys (53186da) with main (334a9b0)

Summary

✅ 157 untouched benchmarks

@aezomz aezomz force-pushed the allow_model_dump_sort_keys branch from 07a31f5 to 7222c8d Compare March 4, 2025 10:22
@aezomz
Copy link
Author

aezomz commented Mar 4, 2025

please review

@aezomz aezomz marked this pull request as ready for review March 4, 2025 13:19
Copy link
Member

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main issue is perf regression

@aezomz
Copy link
Author

aezomz commented Mar 5, 2025

I separated the test out, I also refactor the functions so we can reuse when sort_keys=true.
To keep the original perf benchmark, I have done a simple bool check on sort_keys before using expensive function like sorting.

Let me know what else I need to improve. Thanks

@zzstoatzz zzstoatzz mentioned this pull request Mar 10, 2025
@aezomz
Copy link
Author

aezomz commented Mar 18, 2025

please review, not sure how I can take it from here

Copy link
Member

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I see now that this is not recursive (it only applies to the top level keys). Would it be hard to make it recursive? I fear that if we implement the non-recursive version someone is going to come along and want the recursive version... if so we can make it a Literal['recursive', 'top-level', 'unsorted'] or something like that.

@aezomz aezomz force-pushed the allow_model_dump_sort_keys branch from 0cf4b6d to 95f9329 Compare March 23, 2025 14:44
@aezomz
Copy link
Author

aezomz commented Mar 23, 2025

Hmm I see now that this is not recursive (it only applies to the top level keys). Would it be hard to make it recursive? I fear that if we implement the non-recursive version someone is going to come along and want the recursive version... if so we can make it a Literal['recursive', 'top-level', 'unsorted'] or something like that.

Added different sort mode as above, updated the PR description.

@aezomz
Copy link
Author

aezomz commented Mar 25, 2025

please review 👍

@aezomz aezomz force-pushed the allow_model_dump_sort_keys branch from 95f9329 to c83a212 Compare March 28, 2025 17:28
@aezomz
Copy link
Author

aezomz commented Mar 28, 2025

please review 👍

@aezomz aezomz force-pushed the allow_model_dump_sort_keys branch from c83a212 to 0075a4b Compare March 31, 2025 17:30
@aezomz
Copy link
Author

aezomz commented Apr 1, 2025

please review 👍

There is this test that is failing, unclear if its related to my change.
build-wasm-emscripten

Let me know any other places i can optimize. Thanks!

@aezomz aezomz force-pushed the allow_model_dump_sort_keys branch from dce2689 to 5ce696c Compare April 4, 2025 15:47
@aezomz
Copy link
Author

aezomz commented Apr 11, 2025

please review 👍

@aezomz aezomz force-pushed the allow_model_dump_sort_keys branch from dfe7380 to 1b8d824 Compare April 14, 2025 17:48
@aezomz
Copy link
Author

aezomz commented Apr 15, 2025

please review 👍

1 similar comment
@aezomz
Copy link
Author

aezomz commented Apr 16, 2025

please review 👍

@aezomz aezomz requested a review from adriangb April 23, 2025 15:15
@DouweM DouweM self-assigned this Apr 23, 2025
@DouweM DouweM self-requested a review April 23, 2025 19:24
@aezomz
Copy link
Author

aezomz commented Apr 25, 2025

@DouweM please review :)

@DouweM DouweM force-pushed the allow_model_dump_sort_keys branch from dc0ec59 to 11501f6 Compare April 25, 2025 21:58
Copy link
Contributor

@DouweM DouweM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this @aezomz! I'm a Rust newbie as well, but I have a feeling we can significantly simplify this code by dropping the recursive dict sorting and moving this to the serializer for the dict type specifically. Can you give that a try please?

I also think it's worth seeing if we can reduce the duplication between the if sort_keys/else branches, but I know Rust typing may make that tricky...

Let me know if you get stuck, we can bring in some of our Rust experts!

@aezomz aezomz force-pushed the allow_model_dump_sort_keys branch from 11501f6 to 200865b Compare April 26, 2025 10:29
Copy link
Contributor

@DouweM DouweM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aezomz Thanks, this looks a lot better already, but I have a feeling we can get rid of the recursive sorting entirely, if we do it at each level (model + dict) explicitly.

I'm running into the limits of my Rust skills though, and I'm not sure the data types involved actually allow us to do this.

@davidhewitt Would you mind having a Rusty eyed look for us? :)

if !exclude_default(value, &field_extra, serializer).map_err(py_err_se_err)? {
// Get potentially sorted value
if extra.sort_keys {
let sorted_dict = sort_dict_recursive(value.py(), value).map_err(py_err_se_err)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we still need to sort_dict_recursive here if we're already doing that in the dict serialization itself? (Same question for the 2 calls below)

let value =
value_serializer.to_python(&value, next_include.as_ref(), next_exclude.as_ref(), extra)?;
let value = if extra.sort_keys {
let sorted_value = sort_dict_recursive(py, &value)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we not do a recursive sort here, but sort new_dict specifically after we've built it here?

);
map.serialize_entry(&key, &value_serialize)?;
if extra.sort_keys {
let sorted_dict = sort_dict_recursive(value.py(), &value).map_err(py_err_se_err)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above -- I'd like to get rid of recursive sorting and do this after we build the map here. But I'm not 100% sure the data types here allow that...

Copy link
Contributor

@DouweM DouweM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aezomz Thanks, this looks a lot better already, but I have a feeling we can get rid of the recursive sorting entirely, if we do it at each level (model + dict) explicitly.

I'm running into the limits of my Rust skills though, and I'm not sure the data types involved actually allow us to do this.

@davidhewitt Would you mind having a Rusty eyed look for us? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add sorting keys to model_dump_json
5 participants