Skip to content

cmd/go, cmd/distpack: build and run tools that are not necessary for builds as needed and don't include in binary distribution #71867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
matloob opened this issue Feb 20, 2025 · 17 comments
Labels
GoCommand cmd/go Proposal Proposal-Accepted ToolProposal Issues describing a requested change to a Go tool or command-line program.
Milestone

Comments

@matloob
Copy link
Contributor

matloob commented Feb 20, 2025

Proposal Details

This proposal is to stop including tools that are not needed for builds in the binary distribution. Instead the tools would be built and run as needed by go tool using a similar mechanism to that used to build and run tools declared with a tool directive in go.mod. The goal of this proposal is to reduce the go binary distribution size.

In cmd/distpack, we should be able to discard the following tools before packaging the binary distribution: addr2line, buildid, nm, objdump, pprof, test2json, and trace. These tools do not seem to be invoked by the go command. We could go a little further and also remove doc and fix too because they are only invoked by go doc and go fix respectively.

The install target for main packages in cmd will not change, so go install will continue to install binaries to $GOROOT/pkg/tool/$GOOS_$GOARCH.

If a tool binary is present in that location, it will be run from there, but if not, the go command will fall back to building (if the tool binary isn't cached) and running the tool, similarly to how go tool builds and runs tools declared with the tool directive.

Potential issues we should watch out for and which I would be interested in hearing feedback about:

  • As @prattmic pointed out, users who install a tool, make changes to the source, and then try to use go tool to run it may be surprised that the tool that's run doesn't include their changes. So we may want to think a little more about what to do in that case.
  • If there any use cases that depend on running the (non-build) tools directly from the pkg/tool/$GOOS_$GOARCH directory? This could potentially break them.
  • If there are use cases that depend on running the (non-build) tools on systems that have limitations on the set of binaries allowed run on the system and won't allow users to build and run their own binaries. These are systems that go test and go run wouldn't work on either.

cc @dmitshur

@gopherbot gopherbot added this to the Proposal milestone Feb 20, 2025
@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Feb 20, 2025
@gabyhelp gabyhelp added the ToolProposal Issues describing a requested change to a Go tool or command-line program. label Feb 20, 2025
@seankhliao seankhliao added the GoCommand cmd/go label Feb 22, 2025
@mvdan
Copy link
Member

mvdan commented Feb 22, 2025

I agree with the overall goal of this issue, but I wanted to note that currently go build -x does mention at least one of these tools directly by path, e.g. from one of my main package builds:

/home/mvdan/tip/pkg/tool/linux_amd64/buildid -w $WORK/b001/_pkg_.a # internal

Perhaps it is enough to rewrite the -x mechanism to show go tool buildid instead. It is likely worth searching through the rest of the codebase to see if these tools are mentioned elsewhere.

@seankhliao
Copy link
Member

Does it need to change to go tool buildid? If it's the resolved path then it seems using the path to the cached binary would be more accurate for debugging / reproducing issues, especially if you have a bug that affects the tools.

@mvdan
Copy link
Member

mvdan commented Feb 22, 2025

@seankhliao if the path gets changed to point to a cached binary I guess that would also be fine, but that would be a pretty long and random path, and it wouldn't work if the cache isn't warm - for example if it gets cleaned up manually or after a few days. I would personally prefer the shorter form; it will reproduce the build faithfully as long as none of the inputs like the source code is changed. I don't have a strong opinion here, but -x needs to change one way or another.

@matloob personally, out of all the tools you list, I've only ever run pprof, trace, and objdump directly. I don't think I've ever ran any of them directly via a full path.

users who install a tool, make changes to the source, and then try to use go tool to run it may be surprised that the tool that's run doesn't include their changes. So we may want to think a little more about what to do in that case.

Could you clarify whether the source for these tools would still be shipped? That is, that the first invocation of e.g. go tool nm would build the local source and run a cached binary, but it wouldn't download anything.

If that's the case, wouldn't changing a tool's source code cause go tool to rebuild it thanks to the build cache's hashing of build inputs?

@ianlancetaylor
Copy link
Member

@mvdan The #internal at the end of that line is meant to be a hint that that cmd/go is doing the effect of build, but it is not actually running buildid. The buildid command is a thin wrapper around the "cmd/internal/buildid" package, and the cmd/go tool actually just calls "cmd/internal/buildid"; it doesn't run the buildid program itself.

@matloob
Copy link
Contributor Author

matloob commented Feb 24, 2025

@mvdan Yes, to preserve the ability to run scripts produced by go build -x, we should list go tool buildid instead of the path to buildid. There's a script test that tests this (cmd/go/testdata/script/build_dash_x.txt).

Could you clarify whether the source for these tools would still be shipped? That is, that the first invocation of e.g. go tool nm would build the local source and run a cached binary, but it wouldn't download anything.

Yes, the source for the tools would still be shipped in GOROOT. The invocation of the tool wouldn't download anything.

If that's the case, wouldn't changing a tool's source code cause go tool to rebuild it thanks to the build cache's hashing of build inputs?

If we default to running a tool that's present in $GOROOT/pkg/tool/$GOOS_$GOARCH instead of rebuilding it every time, then we might run an old version of it that doesn't take the latest changes into account. A user would have to reinstall the tool (or clean it) to get the behavior of running the latest tool. The reason I think we should still consider doing this is that some folks may want to have pre-build tools always available so that they don't have to be rebuilt if they expire in the cache.

@aclements aclements moved this from Incoming to Active in Proposals Mar 26, 2025
@aclements
Copy link
Member

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— aclements for the proposal review group

@aclements
Copy link
Member

As @prattmic pointed out, users who install a tool, make changes to the source, and then try to use go tool to run it may be surprised that the tool that's run doesn't include their changes. So we may want to think a little more about what to do in that case.

Maybe I'm misunderstanding this, but isn't this true today, too? If I change the source to cmd/compile and run go tool compile, it doesn't rebuild the compiler. (This behavior is important for things like toolstash, but I think only matters for the build tools, and we could probably accomplish that some other way if necessary.)

@aclements
Copy link
Member

How much would this save in the download size (in particular, after compression)? At a glance, it looks like a fair amount, but it would be good to have some hard numbers.

@aclements
Copy link
Member

Looking at pkg/tool, it looks like there are a few more things we could leave out: covdata, dist, distpack. There are a few things that are involved in the build, but aren't required to build themselves: cgo, preprofile. And vet is only needed if you're running tests.

@matloob
Copy link
Contributor Author

matloob commented Apr 2, 2025

Maybe I'm misunderstanding this, but isn't this true today, too? If I change the source to cmd/compile and run go tool compile, it doesn't rebuild the compiler. (This behavior is important for things like toolstash, but I think only matters for the build tools, and we could probably accomplish that some other way if necessary.)

Yes, it's the current behavior, but the behavior when the tool isn't installed is that we will build from source. So users might get surprised by the change in behavior between when a tool is installed in pkg/tool and when it isn't.

How much would this save in the download size (in particular, after compression)? At a glance, it looks like a fair amount, but it would be good to have some hard numbers.

I measured a ~15M difference in the module zips that we use to download and run newer toolchains on darwin/arm64. I think I remember adding doc and zip increases the savings to ~17M, but I'd have to re-check that.

@matloob
Copy link
Contributor Author

matloob commented Apr 2, 2025

Looking at pkg/tool, it looks like there are a few more things we could leave out: covdata, dist, distpack. There are a few things that are involved in the build, but aren't required to build themselves: cgo, preprofile. And vet is only needed if you're running tests.

Dist and distpack should already be removed from the distribution: https://cs.opensource.google/go/go/+/master:src/cmd/distpack/pack.go;l=172;drc=fc5073bc155545dde4856cccdfcbb31880d1eb66

I wasn't sure if we could include covdata because I think we cache its output together with the test output and I think it's a bit more tricky to make guarantees on it. On the other hand, we don't seem to include its toolid as part of the build so maybe I'm misreading it?

We do include cgo's and preprofile's toolids (and also vet's) as part of the cache keys so if we want to run go tool cgo or go tool preprofile to execute them, we'd have to make sure we're recording the correct toolids for them.

@cherrymui
Copy link
Member

I can think of a use case that potentially can be impacted by this. Suppose some developer downloads a Go distribution, just wants to build binaries to ship to somewhere else, but doesn't run them locally. Previously they only needs to run downloaded binaries. Now with this proposal, if they also wants to, say, investigate some binary using objdump, they'd need to run a binary that is locally built. I can imagine in some environment (e.g. a corporate profile) it only allows binaries downloaded from trusted sources (e.g. our macOS release binaries are signed) but not locally built binaries, or makes it difficult to do so. So they won't be able to easily run go tool objdump.

Would this be a problem? This is sort of a weird use case. Do people actually do this?

@aclements
Copy link
Member

Yes, it's the current behavior, but the behavior when the tool isn't installed is that we will build from source. So users might get surprised by the change in behavior between when a tool is installed in pkg/tool and when it isn't.

That's fair. My sense is that this is fairly advanced usage, so maybe it doesn't matter much?

I measured a ~15M difference in the module zips that we use to download and run newer toolchains on darwin/arm64. I think I remember adding doc and zip increases the savings to ~17M, but I'd have to re-check that.

For reference: go1.24.2.darwin-arm64.tar.gz is 73MB, so this is roughly a 20% savings.

Dist and distpack should already be removed from the distribution

👍

We do include cgo's and preprofile's toolids (and also vet's) as part of the cache keys so if we want to run go tool cgo or go tool preprofile to execute them, we'd have to make sure we're recording the correct toolids for them.

Could we make this conditional on whether cgo is used/a PGO profile is present? IIUC, we don't use these tools otherwise.

These sorts of details certainly don't have to hold up the proposal.

@matloob
Copy link
Contributor Author

matloob commented Apr 9, 2025

That's fair. My sense is that this is fairly advanced usage, so maybe it doesn't matter much?

Yeah, I think it's unlikely a user will go install one of the tools unless they're actively developing them, or if they're trying to save the loading time on each run of the tool (like the time increases brought up by #71733). I'd definitely consider that advanced usage.

We do include cgo's and preprofile's toolids (and also vet's) as part of the cache keys so if we want to run go tool cgo or go tool preprofile to execute them, we'd have to make sure we're recording the correct toolids for them.

Could we make this conditional on whether cgo is used/a PGO profile is present? IIUC, we don't use these tools otherwise.

Currently the toolid is only included in the action id for an action that uses the tool (for example the preprofile tool id is included in the pgoActionID but not for every action), so it is conditional on whether cgo is used or a pgo profile is present.

@aclements aclements moved this from Active to Likely Accept in Proposals Apr 9, 2025
@aclements
Copy link
Member

Based on the discussion above, this proposal seems like a likely accept.
— aclements for the proposal review group

This proposal is to stop including tools that are not needed for builds in the binary distribution, with the goal of reducing the Go binary distribution size. Instead the tools would be built and run as needed by go tool using a similar mechanism to that used to build and run tools declared with a tool directive in go.mod.

The install target for main packages in cmd will not change, so go install will continue to install binaries to $GOROOT/pkg/tool/$GOOS_$GOARCH.

If a tool binary is present in that location, it will be run from there, but if not, the go command will fall back to building (if the tool binary isn't cached) and running the tool, similarly to how go tool builds and runs tools declared with the tool directive.

This is expected to reduce Go binary distribution size by ~20%.

@aclements
Copy link
Member

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— aclements for the proposal review group

This proposal is to stop including tools that are not needed for builds in the binary distribution, with the goal of reducing the Go binary distribution size. Instead the tools would be built and run as needed by go tool using a similar mechanism to that used to build and run tools declared with a tool directive in go.mod.

The install target for main packages in cmd will not change, so go install will continue to install binaries to $GOROOT/pkg/tool/$GOOS_$GOARCH.

If a tool binary is present in that location, it will be run from there, but if not, the go command will fall back to building (if the tool binary isn't cached) and running the tool, similarly to how go tool builds and runs tools declared with the tool directive.

This is expected to reduce Go binary distribution size by ~20%.

@aclements aclements moved this from Likely Accept to Accepted in Proposals Apr 16, 2025
@aclements aclements changed the title proposal: cmd/go, cmd/distpack: build and run tools that are not necessary for builds as needed and don't include in binary distribution cmd/go, cmd/distpack: build and run tools that are not necessary for builds as needed and don't include in binary distribution Apr 16, 2025
@aclements aclements modified the milestones: Proposal, Backlog Apr 16, 2025
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/666476 mentions this issue: cmd/go: change go tool to build tools missing from GOROOT/pkg/tool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GoCommand cmd/go Proposal Proposal-Accepted ToolProposal Issues describing a requested change to a Go tool or command-line program.
Projects
Status: Accepted
Development

No branches or pull requests

8 participants