-
Notifications
You must be signed in to change notification settings - Fork 18k
hash: add Clone #69521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Related Issues and Documentation
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.) |
How do you intend to implement the fallback path in Clone? MarshalBinary + UnmarshalBinary only gives you the ability to save and restore a Hash's state, not construct a new instance of it. You might be able to do it with reflect but it sounds like we're trying to avoid reflect. |
I'll also point out that, when using Clone to optimize repeated hashes with the same prefix, you really want to pair it with a Set method which copies the hash state from one existing instance to another. e.g. h := newHash()
h.Write(prefix)
h0 := h.Clone() // allocates
for _, message := range messages {
h.Set(h0) // doesn't allocate
h.Write(message)
fmt.Println(h.Sum(nil))
} Otherwise, without Set, you end up creating unavoidable garbage on every message. h0 := newHash()
h0.Write(prefix)
for _, message := range messages {
h := h0.Clone() // allocates
h.Write(message)
fmt.Println(h.Sum(nil))
} My initial stab at the hmac optimization (before hashes implemented BinaryMarshaler and BinaryUnmarshaler) combined type HashCloner interface {
hash.Hash
// Clone returns a copy of its reciever, reusing the provided Hash if possible
Clone(hash.Hash) hash.Hash
} It looks a little odd but ends up being fairly ergonomic. One advantage this definition has over separate Clone and Set methods is that there is a nice answer for what to do when the argument and receiver types do not match: just allocate a new value. Whereas // Example implementation for sha1.digest
func (d0 *digest) Clone(h hash.Hash) hash.Hash {
d, ok := h.(*digest)
if !ok {
d = new(digest)
}
*d = *d0
return d
} h0 := newHash()
h0.Write(prefix)
var h hash.Hash
for _, message := range messages {
h = h0.Clone(h) // allocates on first iteration, reuses h on subsequent iterations
h.Write(message)
fmt.Println(h.Sum(nil))
} |
This proposal tackles the same problem that made me start drafting #69293: there is no clean way to clone hashes. About the proposed My motivation to have a common way to clone hash objects is to improve the compatibility of several hash implementations that I've implemented using CNG and OpenSSL with those libraries that are currently using MarshalBinary + UnmarshalBinary to clone a hash. The issue is that CNG/OpenSSL don't provide an API to serialize the internal state of a given hash, but they do provide APIs to clone a hash object. |
Doh. Yes, that doesn't make sense, thank you. We discussed this with @rsc and it it would make sense to follow the
The question is whether we can make it not allocate by a combination of devirtualization and inlining. I think the answer is yes if either the type of the Hash passed to Clone devirtualizes or if it's a concrete type. Notably, I think the This isn't really urgent for Go 1.24. I don't want to add Clone methods to the new crypto/sha3 types (#69982) until we decide this, because there's a risk we'll make them not implement the interface, but crypto/sha3 can start with just MarshalBinary/UnmarshalBinary like every other stdlib Hash. |
Currently it would not work, see #64824. |
@FiloSottile the io/fs pattern always includes the base interface, so: type CloneHash interface {
hash.Hash
Clone() hash.Hash
} What do you think? EDIT: or even: type CloneHash interface {
hash.Hash
Clone() hash.CloneHash
} |
Ah yes, that's what I meant to write. Not sure about returning a |
This way we also make sure that the returned (cloned) hash implements the |
Has a generic hash.Clone been ruled out? |
@FiloSottile Do we plan to implement this new interface by the |
Or instead we can check whether the hash implements the new interface in |
Good point on crypto/hmac. I think I like the option of choosing the implementation in |
This might cause issues in the future if we would like to make Lines 34 to 37 in bea9b91
|
This proposal has been added to the active column of the proposals project |
Talked to @FiloSottile about this and we agreed to leave this for Go 1.25. |
It sounds like we agree on:
Do I have that right? (I'm assuming we ignore CloneXOF until XOF is accepted, and it can be part of the XOF proposal.) |
Personally i think that we should treat |
Given that we're trying to move toward concrete hash types, it would be nice if you could clone a concrete hash type and get back the same concrete hash type. However, I can't quite figure out how to make this work because General question: Why do people need to clone hash functions? I think understanding this would help us in evaluating this proposal. @magical mentioned using this to "optimize repeated hashes with the same prefix", but I don't actually know why people would want to do that either. |
You do not always have a |
That's a great point that The marshal/unmarshal approach was always a bit of a hack, and it's not clear how widely supported it was outside of std, so between that and the awkwardness of implementing this fallback in a Hence, it seems like we should just do the |
Based on the discussion above, this proposal seems like a likely accept. The proposal is to add the following to the package hash
// A Cloner is a hash function whose state can be cloned.
// All hashes in the standard library implement this interface.
type Cloner interface {
hash.Hash
Clone() hash.Hash
} And to implement the As a follow-on, ideally the compiler would implement optimizatoins such that, if the concrete type of the hash is fixed, it's possible to write an allocation-free Clone method. |
Change https://go.dev/cl/644275 mentions this issue: |
Change https://go.dev/cl/644315 mentions this issue: |
No change in consensus, so accepted. 🎉 The proposal is to add the following to the package hash
// A Cloner is a hash function whose state can be cloned.
// All hashes in the standard library implement this interface.
type Cloner interface {
hash.Hash
Clone() hash.Hash
} And to implement the As a follow-on, ideally the compiler would implement optimizatoins such that, if the concrete type of the hash is fixed, it's possible to write an allocation-free Clone method. |
I don't really see the point of Clone as accepted. In the long run i think we want to move towards having the hash packages return concrete types (as is done in crypto/internal/fips140). In that world, it would make more sense for Clone to return the concrete type as well (obviating any need for devirtualization). In fact, most hashes will not even need a Clone method because a shallow copy is enough. You only really need the Cloner interface for cases like crypto/hmac where the underlying hash is not known at compile time, and all the compiler optimization in the world will not help you avoid the allocations in that case. (Unless it is paired with a Set method, as i described in my earlier comment.) |
It seems perfectly reasonable to write code that is not specific to a single hash and yet still wants to Clone the hash. |
Change https://go.dev/cl/649195 mentions this issue: |
Change https://go.dev/cl/652036 mentions this issue: |
@FiloSottile @aclements would be nice if this proposal could land in Go 1.25. There are some CLs already submitted ready for review implementing it. Is there anything else missing? |
Change https://go.dev/cl/670178 mentions this issue: |
Change https://go.dev/cl/670179 mentions this issue: |
Two API questions have come up again during implementation:
For 1, I don't feel strongly, but lean slightly toward changing this to return For 2, the options are a) you just can't clone an hmac, b) hmac checks at construction time whether the underlying hash is clonable and returns one of two different implementation types, or c) hmac always implements Clone, but we have to tweak the API so Clone can fail if the underlying hash doesn't implement Clone (or it does but that Clone fails). Option b is nice, but not very scalable. If we were to add another extension/optional/upgrade interface in the future, we'd need four hmac types. I'm leaning toward option c, and specifying that Clone can return Together, this would look like: package hash
// A Cloner is a hash function whose state can be cloned.
// All hashes in the standard library implement this interface.
type Cloner interface {
Hash
// Clone returns a copy of the Hash with identical and
// independent internal state.
//
// If the hash cannot be cloned, it returns an error that
// wraps [errors.ErrUnsupported].
Clone() (Cloner, error)
} |
I also feel that way, i also would assume that some people would like to pass a
That is nice, but having an error just for one state does not feel great, it could be a bool or we could also allow a nil return to represent that a hash is not cloneable instead. But we should also note that with the addition of an additional error, it might require us to add a global helper Clone func (as before, see: #69521 (comment)), because now to clone a hash you have to make an type assertion and also error check (bunch of code just to clone a hash). |
Re the failure case where a hash doesn't implement the interface, is this essentially only going to be the case for hashes implemented outside of the standard library? (Since we explicitly document I'm not really sure how common this is. There are plausibly hashes we don't implement, but I'm not sure I've ever actually seen anyone do this. Of course I'm not saying it doesn't happen, but I'm wondering how much complexity we should incur to handle this case, especially if it's extremely rare. |
This proposal has been added to the active column of the proposals project |
There isn't a general way to clone the state of a hash.Hash, but #20573 introduced the concept of hash.Hash implementations also implementing encoding.BinaryMarshaler and encoding.BinaryUnmarshaler, and the hash.Hash docs commit our implementations to doing that.
That allows cloning the hash state without recomputing it, as done in HMAC.
go/src/crypto/hmac/hmac.go
Lines 96 to 103 in db40d1a
However, it's obscure and pretty clunky to use.
I propose we add a
hash.Clone
helper function.In practice, we should only fallback to BinaryMarshaler + BinaryUnmarshaler for the general case, while for standard library implementations we can do an undocumented interface upgrade to
interface { Clone() Hash }
. In that sense,hash.Clone
is a way to hide the interface upgrade as a more discoverable and easier to use function.(Yet another example of why we should be returning concrete types everywhere rather than interfaces.)
CloneXOF
If #69518 is accepted, I propose we also add hash.CloneXOF.
None of our XOFs actually implement BinaryMarshaler + BinaryUnmarshaler, but they have their own interface methods
Clone() ShakeHash
andClone() XOF
that each return an interface. I can't really think of a way to use them from CloneXOF, so instead we can add hidden methodsCloneXOF() hash.XOF
and interface upgrade to them.As we look at moving packages from x/crypto to the standard library (#65269) we should switch x/crypto/sha3 and x/crypto/blake2[bs] from returning interfaces to returning concrete types, at least for XOFs. Then they can have a
Clone()
method that returns a concrete type, and aCloneXOF()
method that returns a hash.XOF interface and enableshash.CloneXOF
.(If anyone has better ideas for how to make this less redundant, I would welcome them. I considered and rejected using reflect to call the existing Clone methods because hash is a pretty core package. This sort of interface-method-that-needs-to-return-a-value-implementing-said-interface scenarios are always annoying.)
/cc @golang/security @cpu @qmuntal (who filed something similar in #69293, as I found while searching refs for this)
The text was updated successfully, but these errors were encountered: