Skip to content

Pinning parameters #56

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

andrewfowlie
Copy link

This is a draft of a proposal for a command-line argument in cmdstan to pin parameters that originated from these discussions on discourse

@WardBrian
Copy link
Member

Thanks, this is a great starting point.

Here's a link for people who prefer to read the rendered markdown over the diff: https://github.com/andrewfowlie/design-docs/blob/pin-parameters/designs/0035-pin-parameters.md

@bob-carpenter
Copy link
Member

#35 is already taken on a branch. We keep doing this, so maybe we should just give up numbering them?

@WardBrian
Copy link
Member

#35 is already taken on a branch. We keep doing this, so maybe we should just give up numbering them?

What the Rust project does (and I think the structure of this repo is based on them?) is rename the numeric ID to the pull request number before merging. That seems like a good enough system to adopt if we need to break a 'tie', while also allowing you to back link to the discussion just from the number if we did it consistently.

Copy link
Member

@bob-carpenter bob-carpenter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for getting this started, @andrewfowlie.

A lot of these requested changes are simple superficial formatting and grammar issues, but there's some with actual content.

...
```

These are inelegant and unsatisfactory, partly as they require rewriting and recompiling a model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example above does not require recompilation. And it's only rewriting if you wrote it once without anticipating the pinning. Instead, I'd just mention that this is a very clunky pattern to code and obfuscates the inherent generative structure of the model.

```
pin=<filepath>
```
The value must be a filepath to a JSON file containing pinned values for some or all of the parameters in the parameters block.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to mention that this is going to be the same JSON format used for other Stan files and provide a pointer:

https://mc-stan.org/docs/cmdstan-guide/json_apdx.html#creating-json-files


At present, there are two restrictions on parameters that can be pinned:

- you cannot pin a subset of elements of a vector; all elements must be pinned
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd number these given that it's preceded by "there are two". They are also full sentences, so should be capitalized and ended with a period. Then you don't need the : before.

This is all just super-nitpicky style stuff that doesn't really affect the spec itself.

At present, there are two restrictions on parameters that can be pinned:

- you cannot pin a subset of elements of a vector; all elements must be pinned
- you cannot pin constrained parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to motivate why here so that the reader understands the restrictions. I think we can generalize a bit here so that you can pin constrained parameters that do not depend on any other parameters? For example, consider this example.

parameters {
  real a;
  real<lower = a> b;
}
  • If you pin b, it imposes an upper-bound constraint on a to keep things in order. In general those are too hard to track.
  • If you pin a, I think everything should be fine.

We should run this by @WardBrian to see what the constraints are on the compiler side.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an even more general thing we could allow is to pin any parameter on the unconstrained scale. Of course, we don't need to allow this explictly, because a user could always write

parameters {
  real a;
  real b_raw;
}
transformed parameters {
  real b = lower_bound_jacobian(b_raw, a);
}

and pin b_raw.

We could try to compute the dataflow of which things you are allowed to pin or not (so that you could pin a, or a AND b, but not b by itself), but that would be very tricky since it ultimately needs to be done at runtime when we actually see the pin.json, not in the compiler. So I think starting conservatively and maybe expanding later is best

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starting conservatively

Are you suggesting not allowing any constrained parameters to be pinned? If we can't pin scale parameters, it's going to eliminate the main use case put forward by @andrewgelman, which is pinning scale parameters in hierarchical models. What this would wind up doing is forcing users to define log scale parameters, pin those, and do their own transforms.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to disallow them entirely at first, yes. Even if we did allow simple cases like real<lower=0> scale; where there is no parameter dependence, we'd still need to decide if the user is providing the pinning on the constrained or unconstrained scale, and, if we want it to be on the constrained scale, we'd need to really think about how transform_inits is structured and if it would work for this use case.

The fact that the built in transforms are available as functions now decreases some of the pain of this disallowing, since you can still get the same result if you don't mind the extra variable and mandatory unconstrained representation for the pin value

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinned values should be on the constrained scale, just like custom initial values already are. Unconstrained scale is an implementation detail that users don't need to know about.

As for the C++ implementation, I think pinning must be handled in the model constructor, not transform_inits, and num_params_r__ counts the number of unpinned parameters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would require extra work if we want to support pinning in pathfinder etc.

Yeah, that's a real downside. I think we could minimize the overall code changes if we updated a few of the helpers like stan::model::log_prob_grad and stan::model::gradient, as these are used by all the services

it reads unpinned parameters from the deserializer (as before) but pinned parameters are just copied from the data, no deserializer needed

That would come at the cost of some pretty serious branching on every parameter read, right? Because we won't know at compile time which will be which. (which also means we won't be able to store them as typed entries in the model class, but need to do another round of deserialization somehow, unless we use something like std::optional)

Obviously the compiler will need to change either way, since sorting it out in the algorithmic layer would require some extra metadata or helper functions for those offsets into the parameter vector, but I think the fact that it leaves log_prob more or less untouched is a big win

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would come at the cost of some pretty serious branching on every parameter read, right?

Yes, I'm imagining code that looks like

if (param_pinned__.has_value) {
  param = param_pinned__.value;
} else {
  param = in__.read();
}

for every parameter (whose constraints don't depend on other parameters.)
Since pinning is a runtime decision, there must be branching on every parameter regardless of how it is implemented.

(and, yes, that's std::optional)

I think the fact that it leaves log_prob more or less untouched is a big win

It leaves the API of log_prob unchanged either way, the question is on which side of BridgeStan do you want to implement pinning?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log_probs API would be the same, but doing it in the algorithms would mean the entire code gen of log_prob is the same, with no extra branches. Because you'd get one branch per parameter name, I'm also hesitant to assume the branch predictor will do particularly well with them, even though they'd be basically constant for a given model run.

I also don't see how you could extend this to partially specifying a vector, for example, but a mask-in-the-algorithm approach ultimately doesn't care about that.

on which side of BridgeStan do you want to implement pinning?

That question is actually what motivated my current thinking, because I realized someone working with BridgeStan to write their own sampler already could do pinning by basically masking the gradient you get back from log prob. If you precompute that mask, you can even avoid any branching (by doing a bunch of multiplies instead)

So, my answer would be "outside". That has some direct benefits, like allowing a BridgeStan user who wants to experiment with a pinning only actually instantiate one copy of their model/data to try both pinned and unpinned options, but it also the option that requires the least changes for us in BridgeStan (adding another option to the constructor would require either a new overload or a major version bump of our C API).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't see how you could extend this to partially specifying a vector

You add new deserializer method that takes a mask in addition to other constraints.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting way into the weeds of implementation, but I was imagining what would happen is that we would do a bind in the log density function to generate a new log density function, so that the inference systems (sampling, VI, optimization) don't need to know anything about it. This is what @WardBrian is suggesting for BridgeStan, for example.

That does leave open how to handle the equivalent of write_array---I think that's going to need to be done as @nhuure is suggesting.

[rationale-and-alternatives]: #rationale-and-alternatives


I think pinning parameters at runtime is far more elegant than existing solutions. At first, I had thought about a new keyword in the Stan language itself, e.g., in a parameter constraint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stay away from phrases like "I think" in design documents. There are better ways to hedge, but in this case, I don't think there's going to be much pushback that it's more elegant. Same for "At first, I had thought about"---that can just be "An alternative would be a new keyword in the Stan language itself.". But we wouldn't do that with a keyword, we'd probably do it with an annotation if we want dto go that way.

parameters {
  @pin(0)
  real mu;
  ...

The only situation where Stan requires you to write 0.0 instead of 0 is when you are doing integer arithmetic.

real mu = mu_is data ? mu_val[1] : mu_param[1];
...
```
but even with `<pin=>`, pinning still requires one to change a model and recompile.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New sentence. "Even with an annotation, ...".

As second drawback is that it doesn't let you change the value at runtime. So it's much less flexible than the explicitly coded one you started with.

with pm.do(m, {y: 100.0}) as m_do:
idata_do = pm.sample_posterior_predictive(idata_m, var_names="z")
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great that you included a PyMC example.

I don't know if we want to mention this, but @WardBrian found a bug in their implementation where they don't respect constraints. Probably not relevant for this discussion.

Do you know how general this is in PyMC? If not, we can ask their devs. For example, does PyMC let me set just one component of a vector?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know PyMC that well and I've never used this feature. But just playing around with it, it seems like it's all or nothing.


...

### Pin model parameters argument
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section needs to make it clear that variables are going to be pinned or not pinned---they can't be partially pinned. Suppose I have the following program.

vector[10] a;

I can pin all 10 components of a or none of them. I can't just pin a[3] and a[7] and leave the others as parameters, for example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also worth noting that this restriction is primarily due to the lack of a good design, NOT due to mathematical concerns like the restriction on constrained parameters.

If we could agree on what the specification for the JSON file that only specifies a[3] and a[7] looks like, we could plumb enough information through the compiler and inference to make it work

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this restriction is primarily due to the lack of a good design, NOT due to mathematical concerns like the restriction on constrained parameters.

It's fine mathematically to pin a constrained parameter. It's just too challenging on the Stan side to evaluate the implied constraints on remaining parameters and whether the result is even consistent or expressible in Stan.

In the same way, pinning just two values is more of a problem in terms of rewriting the Stan model than in terms of math. If we pin just two values, it's like this:

data {
  vector[2]  pinned_a;
  ...
parameters {
  vector[8] unpin_a
  ...  
transformed parameters {
  vector[10] a;
  a[1:2] = unpin_a[1:2];
  a[3] = pinned_a[1];
  a[4:6] = unpin_a[3:5];
  a[7] = pinned_a[2];
  a[8:10] = unpin_a[6:8];
  ...

I don't know an equivalently simple workaround to the one @andrewfowlie listed in the original proposal for fully pinned parameters.

@andrewfowlie
Copy link
Author

I've pushed fixes to most of these issues. Thanks for the feedback & advice.

@andrewfowlie
Copy link
Author

I've just read through some of the extended discussion about the proposal. There seem to be two issues generating further discussion.

  1. That in this proposal constrained parameters cannot be pinned
  2. That in this proposal subsets of non-scalar parameters cannot be pinned

If I may, please can I suggest a way forward?

Is it possible to add a 'Future plans/Outlook' part of the proposal? We can summarize some of what's been discussed here, and state explicitly that these aspects may be revisited in the future by additional proposals. This would allow further discussion to settle elsewhere, and record for future reference many of the good points & concerns raised for far.

On the other hand, I appreciate that I am very new here, so feel free to ignore my suggestion :)

@WardBrian
Copy link
Member

Is it possible to add a 'Future plans/Outlook' part of the proposal

I think it's typical to put those in the "Unresolved questions" section (e.g., "Can we support pinning individual elements of containers?" with some discussion underneath)

@bob-carpenter
Copy link
Member

bob-carpenter commented May 6, 2025

[Edit: wasn't done]

I would distinguish between "unresolved questions" and "future work". It sounds to me like the proposal is to only allow pinning of whole unconstrained parameters first.

I would also like to see a discussion of where in the workflow we recommend pinning parameters. The reason I ask is that the main application I have seen is @andrewgelman pinning hierarchical scale parameters. In the centered parameterization, that would be pinning sigma in the following fragment

...
parameters {
  real<lower=0> sigma;
  vector[N] alpha;
  ...
model {
  alpha ~ normal(0, sigma);
  ...

In the non-centered parameterization, it'd be pinning sigma in

...
parameters {
  real<lower=0> sigma;
  vector<multiplier=sigma>[N] alpha;
  ...
model {
  alpha ~ normal(0, sigma);
  ...

The other case where this would apply would be pinning the varying effect alpha to a vector of zeros.

I'm looking for is at least one realistic example in the design doc. Our goal's really to add useful features, which is why I'm pushing back on constrained parameters.

If we can't figure out how to pin constrained parameters, then @andrewgelman's example becomes this,

parameters {
  real log_sigma;
transformed parameters {
  real<lower=0> sigma = lb_constrain_jacobian(0, log_sigma);  // WARNING: I couldn't find name of this function
  ...

with the unconstrained log scale parameter log_sigma pinned from the outside rather than sigma.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants