-
Notifications
You must be signed in to change notification settings - Fork 103
Remove recursion, speed up modulus #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@delfrrr interesting! Do you know why the mod change would have an impact if almost all the time the operands are >= 3? Also, I'm not sure non-recursive version is safer since you cap it by 100 manually, while call stack size is dependent on architecture but is usually higher. |
@mourner But these changes help:
About stack size: is there a way to estimate it from algorithm and number of points perspective? |
@mourner I was thinking about stack limit vs recursion; recursion is generally considered as not good thing; I increased stack length to 1024 (8Kb) and tested with 100M points |
@delfrrr good to know! I've also tried eliminating recursion in the JS version earlier, but it was slower for some reason. Might need another attempt — maybe it was slower because of dynamic stack rather than a fixed-size one. Note that stack length will be low on random input, but legalization depth may increase dramatically on degenerate input (e.g. a ton of points on a circle). However I believe 1024 is as safe as recursion as the max goes, so definitely worth it. |
return diff2 < 0; | ||
} else { | ||
return diff3 < 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW note that this code is now simpler in JS — basically (d1 - d2) || (i - j)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, forgot that mapbox/delaunator#33 makes this irrelevant.
@@ -22,7 +22,7 @@ inline std::string read_file(const char* filename) { | |||
} | |||
} | |||
|
|||
inline std::vector<double> get_geo_json_points(std::string const& json) { | |||
inline std::vector< double> get_geo_json_points(std::string const& json) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to make this change?
@mourner make it closer to mapbox/delaunator#35 ; no performance wins, though 🤔 |
include/delaunator.hpp
Outdated
@@ -194,9 +193,10 @@ class Delaunator { | |||
double m_center_x; | |||
double m_center_y; | |||
std::size_t m_hash_size; | |||
std::size_t m_edge_stack[EDGE_STACK_SIZE]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, is it possible to make the stack static so that you don't allocate a 1024-size array on every triangulation (even for small input)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has a matrix options: (stack OR heap) X (static OR singleton);
- stack, singleton is probably not possible
- stack, static -- means we allocate array even no triangulation is ever called; not sure it's better
- heap static and singleton probably will be slower then stack as it will be less compact and less likely in CPU cache;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will merge PR, but feedback is welcome; I something will pop, I will add a separate issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stack, static -- means we allocate array even no triangulation is ever called; not sure it's better
This is what JS does, although I reduced it to 512 items (2kb). Seems small enough not to matter much when done once, and much better when you e.g. have to triangulate thousands of different small inputs.
I would highly suggest against using fixed sized arrays here and rather just continue to use std vector. |
@flippmoke for some reason I was sure that vector will perform slower, but now I tried and it performs the same; now it uses vector and there is now stack size limitations cc @mourner |
@delfrrr excellent! Let's merge then. |
Related reads:
https://stackoverflow.com/questions/33333363/built-in-mod-vs-custom-mod-function-improve-the-performance-of-modulus-op/33333636#33333636
https://embeddedgurus.com/stack-overflow/2011/02/efficient-c-tip-13-use-the-modulus-operator-with-caution/
cc @mourner @cleeus