-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Better documentation on improving/parallelizing script performance #11978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There is no remedy for the RunSpace overhead. It's a question of balance. Are the parallel workloads worth the expense of the RunSpace overhead? The only way to know is by measuring, which is demonstrated in the StackOverflow post. We have some articles about performance:
But neither cover parallelism. I will see what we can do to add this information. |
@mklement0 & @santiq Do either of you have a problem with me reusing your responses in https://stackoverflow.com/a/73999295 to create a new article in our docs? |
Fine with me. Thanks for tackling this. |
@sdwheeler that santiago isn't me, but I'm okay with this. Thanks |
@santiq Sorry 😉 @santisq Do you have a problem with me reusing your responses in https://stackoverflow.com/a/73999295 to create a new article in our docs? |
@sdwheeler Hi Sean, of course not, feel free to use them as you'd like. Though note that the code provided there isn't ideal in that it doesn't provide the same capabilities as |
@santisq Thanks! I will take a look at your module. For the scope of this article, my intent is to describe parallel options in PowerShell and show how to measure performance so you can choose an appropriate solution. |
Prerequisites
Get-Foo
cmdlet" instead of "New cmdlet."PowerShell Version
7.5
Summary
Details
The function I wanted has a folder path as an input and returns array of all files in all subfolders, including archives and including archives inside archives, and for each .dll and .exe it also returns File Version and Assembly Version.
This is just so I could compare the impact of PR changes on the drop we produce.
We have a non-recursive implementation of that now and I’ve tried to improve/speed up/parallelize it so that it’d do what I want and I’ve hit the fact that foreach -parallel is super expensive due to runspaces.
• I asked much smarter people than me to parallelize my script (Claude in VS Code) and without any further prompt than “ensure fastest execution” it used foreach -parallel that turned out to be a bad idea with the function choking on large archives.
• I asked Claude to write a script from scratch and it still used foreach -parallel
• I came up with an implementation that spawns 7z.exe via Start-Process to unpack necessary archives and files and executes way faster
• I looked at the docs for ForEach-Object (Microsoft.PowerShell.Core) - PowerShell | Microsoft Learn and talk about -parallel being slow due to RunSpaces, but don’t list any remedy
• I started writing this wanting to complain about the performance and me having to use 3rd party program 7zip, but in the process of chasing ends I’ve stumbled on https://stackoverflow.com/a/73999295 answer that mentioned Start-ThreadJob as a lightweight alternative.
• I’ve asked Claude to write the function using Start-ThreadJob and it turns out that almost matches performance of my 7zip implementation.
To me as a layman, my first thought is “to improve powershell performance I can use -parallel”. I can see that it’s the same for Claude as well. The fact that -parallel is incredibly expensive is not obvious.
Proposed Content Type
Concept, About Topic
Proposed Title
No response
Related Articles
https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/foreach-object?view=powershell-7.5#notes
The text was updated successfully, but these errors were encountered: