-
Notifications
You must be signed in to change notification settings - Fork 100
Vulnerability to collision attacks #319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
A few ideas – maybe some more experienced people can comment whether these are any good:
EDIT 2020-10-03:
Note that a random hash salt has very limited security benefits as long as a weak hash function like FNV is used. For FNV, it is possible to construct multi-collisions that collide no matter what salt they are hashed with: https://medium.com/@robertgrosse/generating-64-bit-hash-collisions-to-dos-python-5b21404a5306 |
Thanks for looking into this! My only bit of feedback is this: because there are existing, deployed services that are vulnerable to this attack, and because shutting those services down or replacing them with something that doesn't use In other words, please let's not let perfect be the enemy of good-enough-for-now here. Even something that makes producing collisions slightly more difficult is helpful at this stage, IMO. I'm encouraged that you've jumped right into mitigations in your first 2 posts, so it looks like this discussion is off to a great start! |
|
Is there a way to mitigate the performance degradation of a random seed? How bad does it measure out? |
✨ This is an old work account. Please reference @brandonchinn178 for all future communication ✨ I also opened this issue for another possible mitigation: haskell-unordered-containers/hashable#218
IIUC it's not that a random seed will reduce performance, it's that one would get different hash values every time one restarts the application |
More ideas:
|
✨ This is an old work account. Please reference @brandonchinn178 for all future communication ✨ +1 to the |
Why should the function be in IO? We already have a class that captures needing random state, it is RandomGen. A PRNG would be good enough to solve this problem, real randomness is not needed. |
A random seed will most likely be implemented something like this: the_random_seed :: Seed
the_random_seed = unsafePerformIO ...
{-# NOINLINE the_random_seed #-} This means that every (initial) seed access has to check a tag and follow a pointer. The seed will almost always be in L1 cache when heavy |
Here's some earlier discussion with @tibbe and @infinity0 on mitigating collision attacks: #265 (comment) The gist is that in order to make it sufficiently hard for an attacker to produce hash collisions, you need both a strong hash function like SipHash and a random seed. The problem with SipHash is that apparently it's so slow that you might as well switch to Regarding the proposed fix in #217, @tibbe's assessment was that it would still require a strong hash function and a random seed to be reasonably secure. With many weaker hash functions, it is possible to generate seed-independent collisions, see e.g. this blog post. By this assessment, #217 "adds" little security of its own. |
Storing the salt within the |
Storing salt in a map leads to big problems with intersection, union, and difference. |
If you'd store the salt in the map, it would also be a good idea to have a |
Hit-and-run comment: you often need to have a secure hash function even in libraries that you don't control (e.g. in some http library) so explicitly creating hash maps using a |
Here are my current thoughts on these questions, much inspired by the internal discussion with @tibbe:
Therefore most users will only choose So, while I think that making Therefore, for any security schemes that require a strong hash function, we'll first need to find a hash function that is fast enough. The SipHash versions/implementations that have been tried so far were reported to be too slow. For the choice of hash function I'll need to rely on expert opinions: Are, say, Blake3 or HighwayHash strong enough? For the performance testing we'll need benchmarks. |
What about mitigation measures that do not require a strong hash function?! The main idea that I'm aware of is to store collisions in a Therefore I think that, rather than adopting this approach in |
I've opened #320 to add a security advisory to the package description. We can update the recommendation within once we have a better story for mitigating collision attacks. |
Will read.
…On Wed, Sep 15, 2021, 9:57 AM Simon Jakobi ***@***.***> wrote:
I've opened #320
<#320>
to add a security advisory to the package description. We can update the
recommendation within once we have a better story for mitigating collision
attacks.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#319 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOOF7PPYWWEYBSLVAQNCMDUCCQ4PANCNFSM5D53B2XQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
How about using two hash functions? The top level hash table uses a fast but vulnerable hash. Normally you expect a small number of collisions and you can resolve these by linear search in the list of colliding values. If the number of entries in a collision exceeds some threshold you start storing them in a new hash table which employs a slow but collision resistant hash. The threshold is chosen so that the time to perform a linear search is equal to the time needed to compute the slow hash value. At this point the upper threshold on the CPU time for a new value is the time required for the slow hash function, but non-pathalogical inputs will still work fast. |
@tibbe, I think @PaulJohnson's comment sounds extremely reasonable. How do you think that'll affect |
@PaulJohnson I think this sounds interesting! I guess this would involve I wonder what the implementation in I believe you'd still need to use a random seed, since with a known seed, an attacker could still pre-compute colliding inputs. |
I don't see a need for
@PaulJohnson's doesn't work IMO. A nasty person will use the FNV colliding inputs forcing hashmap to degenerate to be just a normal hashmap with stronger hash function. This probably won't DDoS your system, but it will burn more CPU than needed. Someone who has time (seems to be plenty of people) could benchmark what would happen if say sha256 or blake is used in HashMap versus Map. At which point there is payoff. The last time I did benchmarks myself, for Recall, the advisory input used in the blog post was 5M, that's something easy to limit too. (your API request should fit in say 100k, data upload can be off-side). |
@sjakobi points out that an attacker could pre-compute colliding inputs. That looks like a major issue, which makes the whole "collision resistant hash" thing moot. Suppose we used a cryptographic hash function for maximum resistance. That hash still has to be reduced to N buckets. If you know N and you know the reduction algorithm then you can find collisions in N just by exhaustive search without needing to find collisions in the hash function as a whole. If N=100 then 1% of your trials will produce a desired collision, so producing a few million is not going to be difficult. In fact given the O(n^2) collision algorithm in u-c an attacker with a graphics card could probably do it faster than the target machine can process the data. Lets suppose that my two-level hash idea is implemented, with only the strong hash having a random salt. Now the result of I therefore withdraw my two-level hash proposal. This also means that we need to think about how easy it would be to recover the salt by observing the ordering in |
@phadej suggests that vulnerable machines can limit their vulnerability by limiting the size of the input. Unfortunately that isn't really a solution. It is true that limiting the input to 100k down from 5M would make the problem 50^2 = 2,500 times smaller, but if you can send lots of 100k JSON items then you can still perform an effective DoS attack. And on the other hand arbitrary size limits on inputs tend to create other headaches for software that automatically generates data to send. So at best its just a workaround. |
it's a pragmatic workaround. Unbounded input is always a security concern. Rate-limiting is good idea as well (all APIs i had used recently are either rate-limited, charge per request or even both). Here we try to solve some very specific problems (not everyone has) in a wrong place, IMO. The need for hardened O(1) lookup associative table feels contrived. What's wrong with |
I don't actually have much time on this so I'll try to weigh in one more time but leave the discussion and the decisions to the current maintainer(s):
|
To add to the last point: one of the real challenge here is to fix the problem with DOS attacks due to hash collisions in code that didn't know it would have to care about such things. This is why the solution we (me and @bos) attempted tried to
Unfortunately it didn't work out (we had performance issues and segfaults in trying to deal with them with low-level coding). |
I had an idea for doing a per HashMap salt which I've not seen in this discussion yet: -- same constants as hashable.
#if WORD_SIZE_IN_BITS == 64
type DefaultSalt = -2578643520546668380 -- 0xdc36d1615b7400a4
#else
type DefaultSalt = 0x087fc72c
#endif
data HashMapT (salt :: Nat) k v
= Empty
| BitmapIndexed !Bitmap !(A.Array (HashMapT salt k v))
| Leaf !Hash !(Leaf k v)
| Full !(A.Array (HashMapT salt k v))
| Collision !Hash !(A.Array (Leaf k v))
deriving (Typeable)
-- backwards compatibility
type HashMap = HashMapT DefaultSalt Then modify the functions to be free of salt if they can, for example insert: insert :: forall k v salt . (Eq k, Hashable k) => k -> v -> HashMapT salt k v -> HashMapT salt k v
insert k v m = insert' (hash salt k) k v m
where
salt = natVal (Proxy :: Proxy salt) This allows the default HashMap with backwards compatibility, but also any other HashMapT intersection :: (Eq k, Hashable k) => HashMapT salt k v -> HashMapT salt k w -> HashMapT salt k v Because salt is the same type variable for all arguments, it's enforced to be the same. |
I think you're going to find it hard to avoid boxing around that salt, but
I'd be happy to be proved wrong. Regardless, that's only a small part of
the problem.
…On Mon, Sep 20, 2021, 4:35 AM Jappie Klooster ***@***.***> wrote:
I had an idea for doing a per HashMap salt which I've not seen in this
discussion yet:
#if WORD_SIZE_IN_BITS == 64type DefaultSalt = -2578643520546668380 -- 0xdc36d1615b7400a4#elsetype DefaultSalt = 0x087fc72c#endif
data HashMapT (salt :: Nat) k v
= Empty
| BitmapIndexed !Bitmap !(A.Array (HashMapT salt k v))
| Leaf !Hash !(Leaf k v)
| Full !(A.Array (HashMap k v))
| Full !(A.Array (HashMapT salt k v))
| Collision !Hash !(A.Array (Leaf k v))
deriving (Typeable)
-- backwards compatibilitytype HashMap = HashMapT DefaultSalt
Then modify the functions to be free of salt if they can, for example
insert:
insert :: forall k v salt . (Eq k, Hashable k) => k -> v -> HashMapT salt k v -> HashMapT salt k v
insert k v m = insert' (hash salt k) k v m
where
salt = natVal (Proxy :: Proxy salt)
This allows the default HashMap with backwards compatibility, but also any
other HashMapT
I think this solves the issue with having different salts in an intersect:
intersection :: (Eq k, Hashable k) => HashMapT salt k v -> HashMapT salt k w -> HashMapT salt k v
Because salt is the same type variable for all arguments, it's enforced to
be the same.
Then you can also provide a function to resalt if the user ever ends up
with different salts and still wants to do an intersect. (which results in
a reconstruction of the hashmap).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#319 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOOF7JVH7AYLTTJZ737CSLUC3W35ANCNFSM5D53B2XQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
... and then every use-site should propagate the salt, e.g. |
@treeowl @phadej |
This allows clients to create custom salted hashmaps. For backwards compatibility we use ```haskell -- backwards compatibility type HashMap = HashMapT DefaultSalt ``` Then modify the functions to be free of salt if they can, for example insert: ```haskell insert :: forall k v salt . (Eq k, Hashable k) => k -> v -> HashMapT salt k v -> HashMapT salt k v insert k v m = insert' (hash salt k) k v m where salt = natVal (Proxy :: Proxy salt) ``` This allows the default HashMap with backwards compatibility, but also any other HashMapT I think this solves the issue with having different salts in an intersect: ```haskell intersection :: (Eq k, Hashable k) => HashMapT salt k v -> HashMapT salt k w -> HashMapT salt k v ``` Because salt is the same type variable for all arguments, it's enforced to be the same. Then you can also provide a function to resalt if the user ever ends up with different salts and still wants to do an intersect. (which results in a reconstruction of the hashmap). See thread: haskell-unordered-containers#319
This allows clients to create custom salted hashmaps. For backwards compatibility we use ```haskell -- backwards compatibility type HashMap = HashMapT DefaultSalt ``` Then modify the functions to be free of salt if they can, for example insert: ```haskell insert :: forall k v salt . (Eq k, Hashable k) => k -> v -> HashMapT salt k v -> HashMapT salt k v insert k v m = insert' (hash salt k) k v m where salt = natVal (Proxy :: Proxy salt) ``` This allows the default HashMap with backwards compatibility, but also any other HashMapT I think this solves the issue with having different salts in an intersect: ```haskell intersection :: (Eq k, Hashable k) => HashMapT salt k v -> HashMapT salt k w -> HashMapT salt k v ``` Because salt is the same type variable for all arguments, it's enforced to be the same. Then you can also provide a function to resalt if the user ever ends up with different salts and still wants to do an intersect. (which results in a reconstruction of the hashmap). See thread: haskell-unordered-containers#319 Fix the defaultHash issues Be more verbose about default value
This allows clients to create custom salted hashmaps. For backwards compatibility we use ```haskell -- backwards compatibility type HashMap = HashMapT DefaultSalt ``` Then modify the functions to be free of salt if they can, for example insert: ```haskell insert :: forall k v salt . (Eq k, Hashable k) => k -> v -> HashMapT salt k v -> HashMapT salt k v insert k v m = insert' (hash salt k) k v m where salt = natVal (Proxy :: Proxy salt) ``` This allows the default HashMap with backwards compatibility, but also any other HashMapT I think this solves the issue with having different salts in an intersect: ```haskell intersection :: (Eq k, Hashable k) => HashMapT salt k v -> HashMapT salt k w -> HashMapT salt k v ``` Because salt is the same type variable for all arguments, it's enforced to be the same. Then you can also provide a function to resalt if the user ever ends up with different salts and still wants to do an intersect. (which results in a reconstruction of the hashmap). See thread: haskell-unordered-containers#319 Fix the defaultHash issues Be more verbose about default value Add comma
This allows clients to create custom salted hashmaps. For backwards compatibility we use ```haskell -- backwards compatibility type HashMap = HashMapT DefaultSalt ``` Then modify the functions to be free of salt if they can, for example insert: ```haskell insert :: forall k v salt . (Eq k, Hashable k) => k -> v -> HashMapT salt k v -> HashMapT salt k v insert k v m = insert' (hash salt k) k v m where salt = natVal (Proxy :: Proxy salt) ``` This allows the default HashMap with backwards compatibility, but also any other HashMapT I think this solves the issue with having different salts in an intersect: ```haskell intersection :: (Eq k, Hashable k) => HashMapT salt k v -> HashMapT salt k w -> HashMapT salt k v ``` Because salt is the same type variable for all arguments, it's enforced to be the same. Then you can also provide a function to resalt if the user ever ends up with different salts and still wants to do an intersect. (which results in a reconstruction of the hashmap). See thread: haskell-unordered-containers#319 Fix the defaultHash issues Be more verbose about default value Add comma Fix CI maybe link to source of magick number Fix default hash assertion
This allows clients to create custom salted hashmaps. For backwards compatibility we use ```haskell -- backwards compatibility type HashMap = HashMapT DefaultSalt ``` Then modify the functions to be free of salt if they can, for example insert: ```haskell insert :: forall k v salt . (Eq k, Hashable k) => k -> v -> HashMapT salt k v -> HashMapT salt k v insert k v m = insert' (hash salt k) k v m where salt = natVal (Proxy :: Proxy salt) ``` This allows the default HashMap with backwards compatibility, but also any other HashMapT I think this solves the issue with having different salts in an intersect: ```haskell intersection :: (Eq k, Hashable k) => HashMapT salt k v -> HashMapT salt k w -> HashMapT salt k v ``` Because salt is the same type variable for all arguments, it's enforced to be the same. Then you can also provide a function to resalt if the user ever ends up with different salts and still wants to do an intersect. (which results in a reconstruction of the hashmap). See thread: haskell-unordered-containers#319 I know these libraries will at least fail with the changes because they have instances on `HashMap`, which now need to be `HashMapT salt`: quickcheck-instances-0.3.25.2.drv semirings-0.6.drv relude-0.7.0.0.drv semigroupoids-5.3.5.drv I did this by stubbing out unordered containers in a 100k loc codebase to see what issues would come up in CI.
For lack of better ideas I implemented this: #321 |
I have rekindled #265 to discuss approaches for making the hash salt less predictable to attackers. I'm reluctant to invest much time into that debate while I'm not aware of a hash function that would make the whole fuss worthwhile though (see #319 (comment)). If anyone's interested, I think finding an appropriate hash function would be the best way to make progress on this issue. I noticed that rust currently uses SipHash 1-3:
@tibbe, is that the same SipHash variant that you tried? https://en.wikipedia.org/wiki/SipHash#Overview indicates that SipHash 2-4 might be more common, but also slower. |
I did some digging, spihash was still available in hashable in 2012: https://github.com/haskell-unordered-containers/hashable/tree/fea8260b9e0c0596fc7ef0c608364b3960649f26/cbits It's quite easy to change that code to be spihash-1-3 (the numbers just indicate the loops of hashing/finilization). I don't know how recent the SipHash implementation was tested for performance, but it looks relatively easy to add it back into hashable, and run the benchmarks once more. Perhaps it would be possible to speed it up? maybe we can try? sip hash reference implementation (paper explaining it is linked there as well) |
@jappeace Good find! Yeah, it would be interesting to set up benchmarks with that code. The HighwayHash project also contains a supposedly faster |
A little OT, but in case it comes up and since I haven't wrote about this anywhere: I started a project a few years ago named hashabler that aimed to do a few things:
Unfortunately I realized I didn't really understand (2) until after I'd made a couple releases and towards the end of a big rewrite, etc. I managed to get the library about 2/3 of the way fixed up but lost steam and haven't had the energy to pick it up since. But (2) is interesting and I think not very well-understood. But the short version is a composable hashing library like The point being you can get collisions from your choice of hash function, but also from the Hashable instance itself even in the presence of a perfect hash function (well, in theory. The obvious bad instances in FWIW I'd like to brush that work off some day. In the interim I've had the thought that I could use backpack to allow the choice of hash function to be configurable, without it needing to be part of the regular public API (so e.g. You're welcome to steal with attribution my siphash implementation here if helpful: https://github.com/jberryman/hashabler/blob/master/src/Data/Hashabler/SipHash.hs . It looks like in my version locally I've got a somewhat faster implementation that uses handwritten asm (only because ghc doesn't have bitwise rotate primops) |
For reference: |
@NorfairKing's blog post describes a HashDoS attack on
aeson
which is enabled byu-c
's O(n) handling of hash collisions and use of the same default salt for all hashing operations.While possible mitigation measures for
aeson
are discussed in haskell/aeson#864,u-c
should also prepare a proper response and possibly implement security features for users who are affected viaaeson
or in other ways.In particular, I hope to find answers for the following questions in this thread (and possibly in separate sub-threads):
u-c
users enable in the short term?u-c
?2a. If yes, to what extent should we trade performance and API bloat for security features?
u-c
?I'd also like to point out that I have very limited knowledge and experience with security issues, so I'd be very grateful if more experienced people could chime in and share their advice. :)
The text was updated successfully, but these errors were encountered: