Skip to content

Request for Guidance on Normalization Rules Enforcement #842

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
EzequielPostan opened this issue Jun 16, 2023 · 13 comments
Open

Request for Guidance on Normalization Rules Enforcement #842

EzequielPostan opened this issue Jun 16, 2023 · 13 comments
Assignees
Labels
class 3 Other changes that do not add new features ready for pr Issue is ready for a PR

Comments

@EzequielPostan
Copy link

Issue Description

The current version of the DID Core specification (https://www.w3.org/TR/did-core/#services) states that the value of the serviceEndpoint property MUST be a string, map, or a set composed of one or more strings and/or maps. Additionally, it specifies that all string values MUST be valid URIs conforming to RFC 3986 and normalized according to the Normalization and Comparison rules in RFC 3986 and any normalization rules in its applicable URI scheme specification.

The issue at hand is that RFC 3986 does not provide an explicit list of normalization steps. Different libraries enforce different additional rules to normalization. As we are implementing a new DID method, where users will be submitting DID creation and DID update events, we find ourselves in a dilemma. We want to ensure compliance with the specification but lack clear guidance on the specific normalization rules to enforce.

Without a shared list of normalization rules followed by all implementations of DID methods, we are unsure whether to shift the responsibility of normalization to the users and let them decide what rules they require. However, this approach conflicts with the specification (we would allow users to produce non compliant DID documents).

Request

We kindly request guidance on the following options:

  1. Could it be possible to update the W3C DID Core specification to include an explicit list of normalization rules, accompanied by a comprehensive test suite? or,
  2. Could it be possible to remove the normative enforcement for normalization, allowing implementers to determine the level of normalization they wish to enforce?

Thank you for your attention and support

@kdenhartog
Copy link
Member

kdenhartog commented Jun 20, 2023

Could it be possible to update the W3C DID Core specification to include an explicit list of normalization rules, accompanied by a comprehensive test suite?

At this time no, but this could be possible in an updated version of the specification. Currently there's discussion about rechartering the working group though so we'll have to wait until that get's decided before this can move forward if we take this route.

Could it be possible to remove the normative enforcement for normalization, allowing implementers to determine the level of normalization they wish to enforce?

This would also require an update to a normative change which would be a class 4 change which is W3C lingo for we're changing normative statements and means we need a specific type of WG to change it.

A potential solution that might be available right now to us would be to update the requirements of the registry so that any registered service endpoint is required to define this. cc @msporny to see what his thoughts on this might be.

Also, @pchampin it looks like I'm still apart of a W3C group that allows me to triage tickets in this repo I think because I can close this issue and assign myself (but can't update labels). Could you take a look at what group that might be and remove me? I'm unlikely to participate in the WG at that level anymore due to time commitments so can have that authorization removed from my GH account.

@msporny
Copy link
Member

msporny commented Jun 20, 2023

@kdenhartog wrote:

At this time no, but this could be possible in an updated version of the specification.

Yes to everything @kdenhartog wrote above. He is correct that changing a global standard isn't simple when you don't have an active working group that is chartered to make breaking changes. This is by design, to ensure that these global standards stay stable for long periods of time.

A potential solution that might be available right now to us would be to update the requirements of the registry so that any registered service endpoint is required to define this. cc @msporny to see what his thoughts on this might be.

Even updating the requirements to the registry would have to be done through an active WG, which we don't have right now. That said, this issue will stay open and will be addressed by that WG when it becomes active in the next couple of months.

My suggestion in the meantime is to use this issue to track the normalization rules for the URIs that you see people using in the wild. At present, the vast majority of these URIs can use the normalization rules defined in the WHATWG URL specification:

https://url.spec.whatwg.org/#concept-url-serializer

If you can find a URI scheme that can't work with the above, and doesn't have a spec w/ normalization rules for it, we could use this information to modify the language in the specification in the next charter.

@EzequielPostan, does this give you enough guidance to provide to your community?

@EzequielPostan
Copy link
Author

thank you for the replies

My suggestion in the meantime is to use this issue to track the normalization rules for the URIs that you see people using in the wild.

The thing is that, in the wild, we don't see mentions to normalization rules. At a quick glance:

Sidetree's spec says:

The object MUST include a serviceEndpoint property, and its value MUST be either a valid URI string (including a scheme segment: i.e. http://, git://) or a JSON object with properties that describe the Service Endpoint further. If the values do not adhere to these constraints, the entire Patch Action MUST be discarded, without any of it being used to modify the DID’s state.

which enforces little to no normalization.

Another popular example, did:peer spec does not seem to mention normalization at all.

We haven't explored in full depth, but in general, DID methods' specs don't mention the topic.

  1. Is there any method of your knowledge enforcing extensive normalization rules on URIs?
  2. Are DID methods enforcing normalization without mentioning it in their specs?
  3. is the universal resolver project checking/enforcing anything on this?

At present, the vast majority of these URIs can use the normalization rules defined in the WHATWG URL specification

The problem we face is not the lack of possible specs/RFCs/libraries/groups describing normalization rules. The issue is that, if the spec is not enforcing any clear specific rules and test vectors, then in practice it is enforcing none, because each method can simply select different sets.

Without a change to the spec, we may just not enforce any normalization, and let the user responsible of normalizing URIs if they see need in their use cases.

Once again, thank you for the time to read and reply to this issue

@kdenhartog
Copy link
Member

kdenhartog commented Jun 22, 2023

Is there any method of your knowledge enforcing extensive normalization rules on URIs?

To my knowledge from the various did methods I've read through none have specified this beyond the extent that the sidetree spec did.

Are DID methods enforcing normalization without mentioning it in their specs?

When we had implemented this when I was at MATTR we did do some of our own implementation level normalization for things like this but they were highly specific to the use cases we wanted to use DIDs for. I suspect that's similar for others as well so method authors are treating these as extension points of their method specs as well.

is the universal resolver project checking/enforcing anything on this?

To my knowledge last I checked (been almost 2 years at this point) it's doing very basic normalization, but nothing beyond the scope of what's defined in DID Core. @peacekeeper would be able to speak to the latest for it though I presume.

The problem we face is not the lack of possible specs/RFCs/libraries/groups describing normalization rules. The issue is that, if the spec is not enforcing any clear specific rules and test vectors, then in practice it is enforcing none, because each method can simply select different sets.

Without a change to the spec, we may just not enforce any normalization, and let the user responsible of normalizing URIs if they see need in their use cases.

Once again, thank you for the time to read and reply to this issue

I suspect this is probably the best way to go given that service endpoints were always intended to be a bit more free-for-all to allow for good flexibility here. This is also part of the reason I was thinking this should be defined by the service endpoints registries rather than the methods themselves. Often times the service endpoints are use case specific so over constraining these in did-core or the did-method specs are going to limit the possibility of use cases that can be done with service endpoints. However, the service endpoint registry (and the underlying specs being registered) would properly operate at the use case layer to get more specific about these types of concerns. Hence, my thinking for doing this at that level and setting requirements that the registry set a requirement that these be included by them.

@msporny msporny added the class 3 Other changes that do not add new features label Jul 1, 2024
@msporny msporny added the ready for pr Issue is ready for a PR label Sep 19, 2024
@pchampin
Copy link
Contributor

This was discussed during the #did meeting on 19 September 2024.

View the transcript

w3c/did-core#842 Request for Guidance on Normalization Rules Enforcement

manu: This is someone asking what the normalization rules are for URLs
… Letting implementors decide what level of normalization they support
… A response is the normalization rules for URLs is clear and exists in the WHATWG
… We would need to check these apply cleanly to DID URLs
… We could say we are using the WHATWG normalization rules
… Others state on the issue that people in the field are normalizing in different ways. Very few specs say anything about this.

dmitriz: What is URL normalization?

<manu> These are the URL serialization rules in WHAT WG URL spec: https://url.spec.whatwg.org/#concept-url-serializer

manu: This is about percent encoding. Having dots in the URL path. There is a concept called URL serialization
… See the link above
… apply a series of rules to get to a normalized URL
… problem is DIDs don't have hosts. So we need to analyze this more deeply
… This group needs to see if these rules negatively impact DIDs
… If they don't we should normatively state the WHATWG are the normalization rules we follow

markus_sabadello: Not looked at this in detail. But in the context of DID URL dereferencing. If we also have a path, query string on DID URLs in the same way as on http URLs. Then my intuition is we should use the WHATWG rules

<manu> https://www.w3.org/TR/did-core/#dfn-serviceendpoint

manu: This is the location in the spec the issue is concerned with.

<dmitriz> we COULD sidestep the normalization of service endpoints issue. and require fully qualified URLs

<dmitriz> or not say anything about it.

<dmitriz> (which would mean removing the normalization requirement)

manu: Web browsers URL normalization rules are different from the RFC3936 rules
… options are 1) remove it and not say anything about normalization. 2) Leave it as is and people have to do it, knowing the libraries they will likely use will do something different. 3) Or state that we use the WHATWG rules used by web browsers
… I don't have a strong feeling about the direction

ivan: removing the text is a problem for interoperability. We should not consider this
… I would see what is implemented in various programming environments
… As far as I know all of these use the WHATWG rules

<manu> +1 to what ivan said.

manu: I agree with ivan, lets look at the libraries and see what they do
… We should also generalize the spec text to say that any URLs should be normalized
… We will have to see what happens in specific DID URLs
… Unfortunately there are no tests/examples that show the differences between normalization rules
… We would also need our own tests against some fairly advanced DID URLs

decentralgabe: Lets continue this at TPAC


@w3cbot
Copy link

w3cbot commented Dec 19, 2024

This was discussed during the #did meeting on 19 December 2024.

View the transcript

w3c/did-core#842

manu: who ever takes this issue needs to see what happens when you try to normalize DID URLs with existing libraries.
… We should see what they do, make sure that the spec is aligned with what they do.
… This is a class-3 change; we were expecting URLs to be normalized, but where not specific about how this should be done.
… Ideally, we defer to WHATWG spec and see "this is how it should be done", but need to check what libraries actually do.

Wip: anyone willing to take that on?

KevinDean: I volunteer


@msporny
Copy link
Member

msporny commented Mar 3, 2025

@KDean-Dolphin you volunteered to write a PR for this issue around Dec 19th during a WG call... I'm trying to clear all class3 issues and this is one of them. Do you have an ETA on when you might be able to write a PR to address this issue?

@msporny
Copy link
Member

msporny commented Mar 28, 2025

@KDean-Dolphin, 2nd ping on this issue. Do you have an ETA for when you might be able to write a PR to address this issue?

@msporny
Copy link
Member

msporny commented Apr 5, 2025

@KDean-Dolphin, 3rd ping on this issue. Do you have an ETA for when you might be able to write a PR to address this issue?

@KDean-Dolphin
Copy link

@msporny Hi, catching up on my GitHub notifications, which I've been diligently ignoring for a while now...

I'm on it.

@KDean-Dolphin
Copy link

The text has been moved to the Controlled Identifiers document. Original text:

The value of the serviceEndpoint property MUST be a string, a map, or a set composed of one or more strings and/or maps. All string values MUST be valid URIs conforming to [RFC3986] and normalized according to the Normalization and Comparison rules in RFC3986 and to any normalization rules in its applicable URI scheme specification.

New text:

The serviceEndpoint property is REQUIRED. The value of the serviceEndpoint property MUST be a single string, a single map, or a set composed of one or more strings and/or maps. Each string value MUST be a valid URL conforming to URL Standard.

Normalization has already been removed. However, I propose that we restore it and write it as follows (note that I'm restoring URI instead of URL, but if a service endpoint can only ever be a URL, we can undo that):

The serviceEndpoint property is REQUIRED. The value of the serviceEndpoint property MUST be a single string, a single map, or a set composed of one or more strings and/or maps. Each string value MUST be a valid URI conforming to [RFC3986]. Each URI SHOULD be normalized according to the guidance and rules provided by the Normalization and Comparison rules of RFC3986 and by any guidance and rules in its applicable URI scheme specification.

I would also add the following note:

Note: There is no canonical set of rules for URI normalization, as URI interpretation can vary by implementation. For example, an HTTP URI with query parameters is likely equivalent to the same URI with the query parameters in a different order, but that is not guaranteed and so should not be assumed.

Let me know how you would like to proceed (close this as resolved due to existing edits, discuss on the next call, or proceed with the changes to CID).

@wip-abramson
Copy link
Contributor

Lets discuss this on the call next week.

I do not think we will be able to make changes to the CID spec at this time due to it going through the final review process, but perhaps at a future date when/if the CID spec moves under the DIDWG.

@w3cbot
Copy link

w3cbot commented Apr 24, 2025

This was discussed during the #did meeting on 24 April 2025.

View the transcript

Guidance on Normalization Rules Enforcement

<ottomorac> w3c/did#842

ottomorac: This issue relates to string URIs in the serviceEndpoint property
… manu requested we use the WHATWG working group spec for normalization
… KevinDean proposed a direction

manu: On the proposal from KevinDean, a couple of things to talk about
… First, what would it take to move the CID spec into this group. Trying to keep these specs aligned in different working groups and timelines is very difficult
… Second. The problem with URIs is we should move to the URL spec because the browser vendors have effectively killed URIs

<ivan> ��+1 to manu on URL vs URI/IRI

manu: I think using URIs and RFC3986 is not the best direction. It is the URL spec that WHATWG is in control of

<JoeAndrieu9> also endpoints must have an end, i.e., they actually must be locators (URLs) not just identifiers (URIs)

<smccown> +1 for moving to URLs

manu: I don't know about the normalization rules, will check the CID spec
… I think there are some normalization rules in the URL standard. If there are we can just use those
… If not we cannot use RFC3986. There will be all sorts of interop issues
… That is why the URL standard was created to address these
… I understand where you are trying to go Kevin. I think the longterm path is to move the CID spec into this group

* notes scribe lost audio

<ivan> ��/me there is an URL equivalence in the URL spec: https://url.spec.whatwg.org/#url-equivalence I did not see normalization

Kevin: there is not hard and fast set of rules for URL normalization....

<ottomorac> Kevin: happy to close this

<ottomorac> Kevin: once we have control of the CID spec we can change it later

manu: To answer the first question KevinDean asked. A DID is a URL per the definition of a URL in the URL standard
… I searched for the words canonicalization and normalization in the URL standard and they don't exist.

It says the spec defines canonical forms of ***
… The only examples I can see they give is for the windows drive path.
… There is a bunch of code point canonical rules e.g. % encoding
… You could argue there are normalization and canonicalization rules in the URL standard
… If it is not defined in the URL spec, then the DID spec is not going to define additional rules
… in the CID spec we currently do the right thing. We said the ID value must be a URL conforming to the URL standard
… We do say URI, I think mistakenly in the also known as
… This is the wrong thing if we want to say everything is a URL
… We might want to add to the spec to say, that all rules for normalization and canonicalization are defined in the URL standard. That addresses this issue

<JoeAndrieu9> https://url.spec.whatwg.org/

ivan: I also didnt find normalization in the URL spec.
… To take on the CID spec to be maintained, would require rechartering this WG

<Zakim> JoeAndrieu, you wanted to mention "serializers"

KevinDean: Given that the URL spec is meant to replace the URI spec, it is removing the normalization guidance provided in this spec

<Zakim> manu, you wanted to say, yes to a recharter - eventually :)

JoeAndrieu: In the URL spec, I think the define parsers and serializers. And serializers are handling the normalization, just not using those terms
… +1 to referring to that spec and removing a reference to URIs

manu: +1 to ivan and JoeAndrieu. There are enough algorithms in here for people to look at recognise that these are canonicalization rules
… KevinDean's point is interesting, but again all browser vendors dont implement that stuff anymore
… Also, if you have a URL and that URL has query parameters. Position matters for those parameters now
… I don't think this changes anything. I still think we can point to the URL standard for the rules to address this issue
… Also, responding to ivan. Totally agree that we would have to recharter to take on the CID spec.
… This is a long term thing. In our next recharter we should do that.
… The best time to do that might be when we charter the DID method work
… We could send them both out in the same vote

dmitriz: minor point, removing myself from queue

ottomorac: Seems like we have some direction on this issue, thanks


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
class 3 Other changes that do not add new features ready for pr Issue is ready for a PR
Projects
None yet
Development

No branches or pull requests

7 participants