Skip to content

Remove recursion, speed up modulus #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Oct 7, 2018
Merged

Remove recursion, speed up modulus #12

merged 10 commits into from
Oct 7, 2018

Conversation

delfrrr
Copy link
Owner

@delfrrr delfrrr commented Sep 29, 2018

@mourner
Copy link
Contributor

mourner commented Sep 29, 2018

@delfrrr interesting! Do you know why the mod change would have an impact if almost all the time the operands are >= 3?

Also, I'm not sure non-recursive version is safer since you cap it by 100 manually, while call stack size is dependent on architecture but is usually higher.

@delfrrr
Copy link
Owner Author

delfrrr commented Sep 30, 2018

@mourner fast_mod_3 has no performance advantages (I introduced it during profiling and then did not delete); it simply does modulus;

But these changes help:

  • % m_hash_size -> fast_mod
  • a - a % 3 -> 3 * (a / 3)

About stack size: is there a way to estimate it from algorithm and number of points perspective?

@delfrrr
Copy link
Owner Author

delfrrr commented Oct 1, 2018

  • removed fast_mod_3 to not rise confusion
  • increased stack size and tested on 100M points

@mourner I was thinking about stack limit vs recursion; recursion is generally considered as not good thing; I increased stack length to 1024 (8Kb) and tested with 100M points

@mourner
Copy link
Contributor

mourner commented Oct 1, 2018

@delfrrr good to know! I've also tried eliminating recursion in the JS version earlier, but it was slower for some reason. Might need another attempt — maybe it was slower because of dynamic stack rather than a fixed-size one.

Note that stack length will be low on random input, but legalization depth may increase dramatically on degenerate input (e.g. a ton of points on a circle). However I believe 1024 is as safe as recursion as the max goes, so definitely worth it.

return diff2 < 0;
} else {
return diff3 < 0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW note that this code is now simpler in JS — basically (d1 - d2) || (i - j).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, forgot that mapbox/delaunator#33 makes this irrelevant.

@@ -22,7 +22,7 @@ inline std::string read_file(const char* filename) {
}
}

inline std::vector<double> get_geo_json_points(std::string const& json) {
inline std::vector< double> get_geo_json_points(std::string const& json) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to make this change?

@delfrrr
Copy link
Owner Author

delfrrr commented Oct 3, 2018

@mourner make it closer to mapbox/delaunator#35 ; no performance wins, though 🤔

@@ -194,9 +193,10 @@ class Delaunator {
double m_center_x;
double m_center_y;
std::size_t m_hash_size;
std::size_t m_edge_stack[EDGE_STACK_SIZE];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, is it possible to make the stack static so that you don't allocate a 1024-size array on every triangulation (even for small input)?

Copy link
Owner Author

@delfrrr delfrrr Oct 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has a matrix options: (stack OR heap) X (static OR singleton);

  • stack, singleton is probably not possible
  • stack, static -- means we allocate array even no triangulation is ever called; not sure it's better
  • heap static and singleton probably will be slower then stack as it will be less compact and less likely in CPU cache;

cc @flippmoke @cleeus

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will merge PR, but feedback is welcome; I something will pop, I will add a separate issue

Copy link
Contributor

@mourner mourner Oct 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stack, static -- means we allocate array even no triangulation is ever called; not sure it's better

This is what JS does, although I reduced it to 512 items (2kb). Seems small enough not to matter much when done once, and much better when you e.g. have to triangulate thousands of different small inputs.

@flippmoke
Copy link
Contributor

I would highly suggest against using fixed sized arrays here and rather just continue to use std vector.

@delfrrr
Copy link
Owner Author

delfrrr commented Oct 6, 2018

@flippmoke for some reason I was sure that vector will perform slower, but now I tried and it performs the same; now it uses vector and there is now stack size limitations

cc @mourner

@mourner
Copy link
Contributor

mourner commented Oct 6, 2018

@delfrrr excellent! Let's merge then.

@delfrrr delfrrr merged commit c1521f6 into master Oct 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants