Skip to content

Rewrite inlining pass #1935

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Rewrite inlining pass #1935

wants to merge 6 commits into from

Conversation

vouillon
Copy link
Member

No description provided.

@vouillon vouillon force-pushed the inlining branch 3 times, most recently from 840420d to 7b64a79 Compare April 14, 2025 23:08
@vouillon vouillon force-pushed the inlining branch 3 times, most recently from 79446f9 to ba1a622 Compare April 16, 2025 15:48
@vouillon vouillon force-pushed the inlining branch 4 times, most recently from b62b39e to 5cb6652 Compare April 24, 2025 17:49
@vouillon vouillon marked this pull request as ready for review April 24, 2025 17:50
@hhugo
Copy link
Member

hhugo commented Apr 25, 2025

I've pushed a fixup to the testsuite.
We should check how this PR affects functor heavy programs (something using core maybe).
@TyOverby could you tests this PR on your side ?

@hhugo
Copy link
Member

hhugo commented Apr 25, 2025

We need a changelog entry

@hhugo
Copy link
Member

hhugo commented Apr 25, 2025

I'm not certain I read the benchmark correctly.
It seems that partial render table sees a code size increase of 10%, memory increase of ~50%, compilation time increase of 30% for not runtime improvement.

@hhugo
Copy link
Member

hhugo commented May 6, 2025

Maybe we can wait for #1962 to get better measurements.

@hhugo
Copy link
Member

hhugo commented May 7, 2025

We don't have the latest benchmark.
The last result we have shows a runtime regression for ocamlc (maybe some noise ?).

With this PR, we seem to double the time spent in inline. We can probably live with that.

@hhugo hhugo force-pushed the inlining branch 2 times, most recently from 10a1ba8 to 6aaf9ad Compare May 7, 2025 21:59
@hhugo
Copy link
Member

hhugo commented May 8, 2025

@vouillon, we need to investigate the macos failure.
Can we do anything to reduce compilation time with --opt 1 ?

@TyOverby, gentle ping on testing this PR.

This PR definitely need a changelog entry

@hhugo
Copy link
Member

hhugo commented May 9, 2025

@vouillon, we need to investigate the macos failure.

It could be an issue with node 24.

@vouillon
Copy link
Member Author

vouillon commented May 9, 2025

@vouillon, we need to investigate the macos failure.

The test pass a lazy list to a function. This value is unfolded into something pretty large. The function is no longer inlined (we avoid inlining functions containing loops at toplevel since we expect that the toplevel code will not get optimized). Since node retains function parameters, the value is not garbage-collected and node eventually runs out of memory.

@vouillon
Copy link
Member Author

vouillon commented May 9, 2025

I need to go through this code again and add some comments.
I'll see whether I can reduce compilation time with --opt 1.

@hhugo
Copy link
Member

hhugo commented May 9, 2025

@vouillon, we need to investigate the macos failure.

The test pass a lazy list to a function. This value is unfolded into something pretty large. The function is no longer inlined (we avoid inlining functions containing loops at toplevel since we expect that the toplevel code will not get optimized). Since node retains function parameters, the value is not garbage-collected and node eventually runs out of memory.

Downgrading to node 23 fixes the CI

@hhugo
Copy link
Member

hhugo commented May 9, 2025

There is a large increase for maxresident bewteen node 23 and node 24
node 23

$ /bin/time node _build/default/compiler/tests-ocaml/match-exception/streams.bc.js
iter_stream with handler case (match) is tail recursive
3.79user 0.52system 0:04.00elapsed 107%CPU (0avgtext+0avgdata 2077648maxresident)k
0inputs+0outputs (0major+510186minor)pagefaults 0swaps

node 24

$ /bin/time node _build/default/compiler/tests-ocaml/match-exception/streams.bc.js
iter_stream with handler case (match) is tail recursive
3.77user 0.92system 0:02.89elapsed 162%CPU (0avgtext+0avgdata 3259072maxresident)k
0inputs+0outputs (0major+825840minor)pagefaults 0swaps

@TyOverby
Copy link
Collaborator

TyOverby commented May 9, 2025

Apologies for the delay; I didn't see this thread for a while. We should have some test and benchmark results ready for you next week.

@TyOverby
Copy link
Collaborator

TyOverby commented May 9, 2025

out of curiosity, what was the osx / node-24 issue? consuming too much memory?

@hhugo
Copy link
Member

hhugo commented May 9, 2025

Apologies for the delay; I didn't see this thread for a while. We should have some test and benchmark results ready for you next week.

Many improvements landed on master in the past few days. It would be nice to test both base and tip of the PR so we can understand the impact of the PR alone

@hhugo hhugo force-pushed the inlining branch 3 times, most recently from 1e96ca7 to 29f7ef3 Compare May 14, 2025 11:29
@vouillon
Copy link
Member Author

out of curiosity, what was the osx / node-24 issue? consuming too much memory?

@TyOverby See my comment above.

@vouillon
Copy link
Member Author

I'm not certain I read the benchmark correctly. It seems that partial render table sees a code size increase of 10%, memory increase of ~50%, compilation time increase of 30% for not runtime improvement.

Right, the aggressive inlining of functors does not really seem to result into any runtime improvement with js_of_ocaml. So it is not enabled only with wasm_of_ocaml.

vouillon added 6 commits May 15, 2025 21:29
- We are a lot more aggressive at inlining functor-like functions in
wasm_of_ocaml, since this may enable further optimizations
- We are more cautious at inlining nested functions, since this can
result in memory leaks
- We inline a larger class of small functions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants