Skip to content

Semantic skeletons design #1067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open

Conversation

aphillips
Copy link
Member

This design document contains the proposed design for including semantic skeletons into Unicode MessageFormat's functions.

This PR is based on @sffc's doc

This design document contains the proposed design for including semantic skeletons into Unicode MessageFormat's functions.

This PR is based on @sffc's [doc](https://docs.google.com/document/d/1s7GeN5V0cnw9B1erfMHTWwmnxz0EJkq2v78ZjzRq2t8/edit)
@aphillips aphillips added the design Design document or issues related to design label Apr 6, 2025
@aphillips aphillips requested review from eemeli and sffc April 6, 2025 15:13
Comment on lines 115 to 118
2. It should be possible to format field-based time types
(e.g. those that contain seperate values per field type in a date/time, such as a year-month)
3. It should be possible to format [floating time](https://www.w3.org/TR/timezone/#dfn-floating-time) values
(e.g. those that are not tied to a specific time zone)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we care about the input value's type? Are these not implementation concerns?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are implementation concerns. But they generate requirements for date/time formatting options.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? Could you give an example, because I don't see how.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One such problem is the need to manage an input value's relationship to the timeline. While temporal types trying to pack all of the information into strongly typed objects, classical time values are usually timestamps (seconds or millis of epoch time). Formatting them requires a time zone (or at least an offset). Which means some external way of expressing the zone when it is different from the runtime's default time zone.

Delivery time local to you and not this web server will be: {$timestamp :datetime timezone=$usersTimeZone}

There is also a need to be able to "float" a value: remove it from the timeline so that field values stay constant regardless of time zone. This involves removing the time zone from the value, the equivalent of ZonedDateTime.toLocalDateTime() in Java:

Your birthday is: {$birthday :datetime offset=none} (option name made up on the spot)

Comment on lines 119 to 120
4. Date/time formatters should not permit users to format fields that don't exist in the value
(e.g. the "month" of a time, the "hour" of a date)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, isn't this an implementation concern? I don't see what this has to do with the shape of the formatting options.

Copy link
Member

@sffc sffc Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the main benefits of the semantic skeleton design, and I think points 1-4 can all be merged into this one. The point is that not all date types are the same and not all formatters are the same, and the date type needs to be able to expose all fields that the formatter needs.

```
{$date :datetime dateFields="YMD"}
{$date :datetime date="YMD"}
{$date :datetime fields="YMD"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are dateFields, date and fields just different possible names for the same option? Or do they mean different things? (Same question about timePrecision vs. time below.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different possible names for the same option.

```

#### TimePrecision

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a little more explanation (it should be possible to follow this doc without reading the other linked-to docs).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the spec is basically empty and in need of work. I started to noodle around with it today. Each one of the options will need quasi-complete descriptions in order to see how they'll work and the relative usability of each.

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about it, the more I think we should either go all-in on 4 functions or all-in on 1 function. The "middle ground" :date, :time, :datetime just seems flawed to me: it is trying to be type-safe but it fails at being type-safe.

Comment on lines +167 to +169
1. Date/time formatters should permit users to specify the desired width of indvidual fields
in a manner similar to classical skeletons,
while relying on locale data to prevent undesirable results.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to be more clear about what is "required" versus "wanted".

I think "required" is that users should specify an overall length, and "wanted" to hint at the width of an individual field independently of the overall length.

- Better at documenting the message author's intention

_Cons_
- _Lots_ of functions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But only 1 more than the previous option. The only one this adds is :zoneddatetime.


`:date`, `:time`, `:datetime`, `:zoneddatetime`, *maybe* `:zoneddate`, `:zonedtime`, `:timezone`

Problem: Most users are likely to prefer date/time/datetime to zoneddate/zonedtime/zoneddatetime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, so if naming is the only problem here, let's bikeshed the names and make another set of names. Let's try not to deviate too much from java.time and Temporal. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean like java.time and JS Temporal just did with Local vs. Plain? 🤣

Let's not overlook the fact that lots of platforms still don't have temporal types for date/time values. And that platforms that do have temporal types also have timestamps and classical time values that require formatting. The problem I'm calling out here is absolutely about bikeshedding the names/options--in order to ensure that we get the most usable syntax in MF. That shouldn't deviate one whit from what our Java/JS friends (who are mostly us after all) have done, except to make the concepts portable.

Comment on lines 251 to 253
Problem: Different platforms cannot agree on what to call a Floating Time Value: HTML and Java use `LocalXXX`,
JavaScript has adopted `PlainXXX`,
some others use different terms, such as `CivilXXX`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the relevance of this. Isn't this purely an implementation concern, which has no visibility in the MF2 syntax? As I understand it, the idea is for the function names to describe the formatted output, not the input.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reasonable approach would be to name the functions consistent with usage. The point here is that there is not agreement on what to call these.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand how naming concerns about an input value type would transmit to the syntax. For example, I would expect an expression for a time with hours and minutes to look something like

{$t :datetime hour=numeric minute=numeric}

or maybe

{$t :time fields=hm}

irrespective of whether $t held a floating time value, or one with a timezone attached, or any other representation of the value to be formatted.

Or do you think we ought to have different functions for each input datetime type supported by an implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is capturing some of the conversation in Monday's call. Ultimately this isn't the place for it.

Some people, such as @sffc, have expressed a desire for the functions to be "type safe". This might mean naming functions for the the underlying behavior, e.g. :localdate or :localtime--- s/local/[civil|plain|...]/g

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "type safe" I mean that there should be well-defined semantics around when implementations should emit a Bad Operand error. For example, if you pass an implementation-specific "time-only" type to :date, you should get a Bad Operand error.

@aphillips aphillips marked this pull request as ready for review May 12, 2025 17:22
Start working on error section, add references to ICU4X field sets, including a table
Copy link
Collaborator

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine. The document still covers input datetime values and their details in way more detail than I understand to be relevant to the consideration of choosing between something like "semantic skeleton" and "field options".

Currently, my preferred option would be for us to end up with the functions :date, :time, and :datetime, each of which would format what they describe. This would mean that it would not be possible to use :datetime to output only a time or a date; both would always be present.

With this approach, we could have each of the following:

{$t :time precision=second}

{$t :date fields=month-day-weekday}

{$t :datetime dateFields=month-day-weekday timePrecision=second}

I find the single-letter abbreviations of the date fields mostly workable, except E standing for "weekday" is completely opaque. Using the fields' spelled-out names is longer, but would allow a translator to understand what they represent.

I don't think the inclusion or not of the timezone in the output merits separate :zoned* variants of the functions; that should instead be controlled by a timeZoneName option as in JS. That option should be supported on :datetime and :time, but probably not on :date.

I think the inclusion (or not) of input override fields like timeZone and offset should be considered as a separate discussion from this one, which is focusing on how to represent the formatted output.

I don't think we should allow for a resolved :date value to be used as the operand of :time or :datetime, or a resolved :time value to be used as the operand of :date or :datetime, even if that might be technically possible in some implementations. I would, however, allow a resolved :datetime value to be used as the operand of :date or :time, with each only caring about the options relevant to it. This would allow for a message to use a single basket of options to e.g. apply an appropriate timezone, while avoiding ambiguities about what it means to format the :date of a :time.

@sffc
Copy link
Member

sffc commented May 15, 2025

I agree with @eemeli's comment (😃) except I think we need more discussion on

I don't think the inclusion or not of the timezone in the output merits separate :zoned* variants of the functions; that should instead be controlled by a timeZoneName option as in JS. That option should be supported on :datetime and :time, but probably not on :date.

I've stated previously that I don't understand the mental model for having :date, :time, and :datetime but not :datetimezone (or :zoneddatetime). Just as a date is fundamentally different from a time, and a datetime is fundamentally different from a date or a time, a zoned datetime is fundamentally different from a date, time, or datetime. If we split date, time, and datetime into individual functions, why would we not do the same for the zoned versions?

@eemeli
Copy link
Collaborator

eemeli commented May 15, 2025

If we split date, time, and datetime into individual functions, why would we not do the same for the zoned versions?

I'd say it's because the timezone is a part of the time specifier. I do not see including or not the timezone as introducing a fundamental difference in the formatted output, though it's obviously a significant part of the input value that's being formatted.

@sffc
Copy link
Member

sffc commented May 15, 2025

  1. May
  2. May 15
  3. May 15, 10 AM
  4. May 15, 10:01 AM
  5. May 15, 10:01 AM CEST

Can you explain a mental model for why you draw a line between only 2 and 3?

@eemeli
Copy link
Collaborator

eemeli commented May 15, 2025

Without a time, a date is describing a whole day or an even wider range. With a time, we're describing a specific instant. We speak of them differently.

So for example we don't think of "May 15" as implicitly meaning midnight or noon or any other time, but the whole day when multiple things could happen one after the other. Meanwhile, "May 15, 10 AM" conceptually expands to mean 10:00:00 on that day, a single moment in time. If it's not implicitly enough tied to a timezone, we might want to include one for disambiguation, but we're still speaking about a specific instant of time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design document or issues related to design
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants