Skip to content

sort: clarify Slice comparator requirements #73420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jagprog5 opened this issue Apr 17, 2025 · 12 comments · May be fixed by #73421
Open

sort: clarify Slice comparator requirements #73420

jagprog5 opened this issue Apr 17, 2025 · 12 comments · May be fixed by #73421
Labels
Documentation Issues describing a change to documentation. NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@jagprog5
Copy link

jagprog5 commented Apr 17, 2025

The doc for sort.Slice states:

// The less function must satisfy the same requirements as
// the Interface type's Less method.

This is referring to the less function interface here:

// Less must describe a transitive ordering:
//  - if both Less(i, j) and Less(j, k) are true, then Less(i, k) must be true as well.
//  - if both Less(i, j) and Less(j, k) are false, then Less(i, k) must be false as well.

This describes only transitive ordering. In contrast, the underlying algorithm, pdqsort, requires strict weak ordering.

This is an important distinction. If a comparator is, for example:

  • reflexive
  • transitive

Then it will be conforming to the documented requirements yet will still cause crashes under nebulous implementation defined circumstances. e.g. the number of elements to sort is greater than 33, and elements happen to be in a specific order.

To fix:

Change the documented requirement of sort.Slice to what is stated in sort.SortFunc.

// SortFunc sorts the slice x in ascending order as determined by the cmp
// function. This sort is not guaranteed to be stable.
// cmp(a, b) should return a negative number when a < b, a positive number when
// a > b and zero when a == b or a and b are incomparable in the sense of
// a strict weak ordering.
//
// SortFunc requires that cmp is a strict weak ordering.
// See https://en.wikipedia.org/wiki/Weak_ordering#Strict_weak_orderings.
// The function should return 0 for incomparable items.
@jagprog5 jagprog5 linked a pull request Apr 17, 2025 that will close this issue
@gabyhelp
Copy link

Related Issues

Related Code Changes

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

@gabyhelp gabyhelp added the Documentation Issues describing a change to documentation. label Apr 17, 2025
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/666355 mentions this issue: sort/slice: fix doc for sort.Slice

@randall77
Copy link
Contributor

I'm not certain that changing the spec is the way forward here. I'd rather change the implementation so that it conforms to the spec as written.
What is the case that causes a problem here? Is it when less(x,x) returns true? Can we detect that somehow in pdqsort and handle it, or maybe even bail out to a slower sort?

@jagprog5
Copy link
Author

jagprog5 commented Apr 17, 2025

@randall77 think of this as a clarification and not a change to the doc. the doc is written poorly. practically speaking, everyone assumed strict weak ordering. but it was written in a way that, I'd argue, gave weaker requirements than is necessary.

I've changed the title

@jagprog5 jagprog5 changed the title incorrect doc sort.Slice comparator clarify doc sort.Slice comparator Apr 17, 2025
@randall77
Copy link
Contributor

Should we just require a strict weak ordering everywhere then?
(I'm not sure why in the CL sort.Slice and sort.SliceStable have explicitly different requirements.)

@jagprog5
Copy link
Author

jagprog5 commented Apr 17, 2025

Should we just require a strict weak ordering everywhere then?

Yes, I've changed it. we already require a strict weak ordering everywhere, this just clarifies that.

As for the difference in sort vs stable, I've dropped that. There is a different in the requirements only from the implementation, but not the doc. I'd prefer it to be part of the doc / spec as well, but that would require more work in justifying. And, I see that C++ stdlib doesn't provide a public distinction either, so it's probably best to just stay consistent anyway.

@seankhliao seankhliao changed the title clarify doc sort.Slice comparator sort: clarify Slice comparator requirements Apr 17, 2025
@prattmic prattmic added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Apr 18, 2025
@prattmic prattmic added this to the Backlog milestone Apr 18, 2025
@randall77
Copy link
Contributor

Do you have an example of a sort that actually goes wrong with a non-strict-weak-ordering comparison? Particularly, an ordering which is transitive and asymmetric but is (sometimes at least) reflexive.

@jagprog5
Copy link
Author

Sure, here's an example which satisfies the letter but not the spirit of the current spec:

fn less(a, b):  // pseudocode
    true        // intended to trivially breaks irreflexive property (CRASH)

To clarify why this works, let's look at the current spec:

// Less reports whether the element with index i
// must sort before the element with index j.

This says nothing. We are defining what less means! So it's circular.

// If both Less(i, j) and Less(j, i) are false,
// then the elements at index i and j are considered equal.

This part doesn't matter, since like above, we are defining what "equal" means through the comparator so this part also says nothing. But anyways: requirement is never used since Less(i, j) and Less(j, i) will never be false, both are stated literally as true.

// Sort may place equal elements in any order in the final result,
// while Stable preserves the original input order of equal elements.

Not applicable.

// Less must describe a transitive ordering:
//  - if both Less(i, j) and Less(j, k) are true, then Less(i, k) must be true as well.

This will always be satisfied since Less(i, k) is stated literally to be true regardless of preconditions.

//  - if both Less(i, j) and Less(j, k) are false, then Less(i, k) must be false as well.

Will never occur since Less(i, j) and Less(j, k) are both stated literally to be true.

@randall77
Copy link
Contributor

// Less reports whether the element with index i
// must sort before the element with index j.
This says nothing. We are defining what less means! So it's circular.

I don't agree. This says a lot. To me it means that Less(i, j) implies that s[i] must appear before s[j] in the output. It is not possible to satisfy that if i!=j and both Less(i, j) and Less(j, i) are true.

And this presumably also implies that we can't have Less(i, i) be true, because then s[i] must appear before itself. Depending on whether you agree that "before" here means "strictly before", which I think is clear but maybe there's a bit more wiggle room than the previous paragraph.

@jagprog5
Copy link
Author

jagprog5 commented Apr 23, 2025

That you for your patience @randall77

I don't agree. This says a lot.

Ok fair enough! So, if this statement:

// Less reports whether the element with index i
// must sort before the element with index j.

Describes everything that is needed. From that alone, we can derive all three strict weak ordering properties.

But then why is the transitive ordering line stated below that? It's confusing as it's like an elaboration on the specifics but only stating one.

There is a standard way of stating these requirements and only this one part of the stdlib isn't using it. If the documentation for Less is defining a strict weak ordering, then it should say so using those words for clarity! I've got someone at my work who defined a complex comparator but didn't know these properties, and improving the doc is helpful for others.

@randall77
Copy link
Contributor

Describes everything that is needed. From that alone, we can derive all three strict weak ordering properties.

I don't think you can derive transitivity from that.

Consider the case where only Less(1,2) and Less(2,3) return true. It is easy to satisfy "Less(i,j) => s[i] appears before s[j]", but that isn't a transitive relation. Particularly, Less(1,3) is false. In other words, !Less(1,3) does not imply that s[3] appears after s[1].

@randall77
Copy link
Contributor

I agree that we should probably make all the docs consistent, if in fact all our implementations require the same thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Issues describing a change to documentation. NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants