Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[css-fonts-5] Removing font-language-override #5484

Open
litherum opened this issue Aug 29, 2020 · 14 comments
Open

[css-fonts-5] Removing font-language-override #5484

litherum opened this issue Aug 29, 2020 · 14 comments

Comments

@litherum
Copy link
Contributor

font-language-override is only implemented by one engine, and has been at risk for a long time.

Philosophically, there shouldn't be two places authors can specify language to get correct text shaping.

We should remove font-language-override from the spec.

@svgeesus
Copy link
Contributor

Yes, it does seem like a rather niche case (this font doesn't support language X, but if I tell the font that this is actually language Y it comes out better for language X).

I concur with dropping it (more on the undesirability than the single implementation).

@litherum
Copy link
Contributor Author

@litherum
Copy link
Contributor Author

litherum commented Sep 4, 2020

(This issue is about both the property and the descriptor.)

@jfkthame
Copy link
Contributor

jfkthame commented Sep 4, 2020

Philosophically, there shouldn't be two places authors can specify language to get correct text shaping.

I don't think it's quite as simple as this. The use case for font-language-override arises because of a mismatch between different things referred to (rather loosely) as "language". In HTML, authors can tag content with the lang attribute, normally thought of as "language" although it can carry additional subtags such as script and region, so it's really a locale identifier.

When it comes to text shaping, however, the functionality in OpenType fonts is driven via tags that are often referred to as "language", but are more formally called "language system" tags. This is not at all the same thing.

Quoting from https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags (emphasis added):

Language system tags identify the language systems supported in a OpenType Layout font. What is meant by a “language system” in this context is a set of typographic conventions for how text in a given script should be presented. Such conventions may be associated with particular languages, with particular genres of usage, with different publications, and other such factors. For example, particular glyph variants for certain characters may be required for particular languages, or for phonetic transcription or mathematical notation.

The OpenType tag is about a set of typographic conventions, not directly about language (although it is often possible to infer a reasonable default mapping from one to the other).

In principle, a given set of conventions may be shared across multiple scenarios. For instance, two different languages (perhaps unrelated) may happen to follow the same conventions. Language system tags can be registered on a perceived-need basis, however; as a result, there is no guarantee that each tag represents a distinct and unique set of conventions. Tags can, however, be registered with the intent of representing conventions that apply to multiple languages. In such cases, the documented description for the tag should reflect that intent.

It should also be noted that there may be more than one set of typographic conventions that apply to a given language.
Therefore, in several respects, language system tags do not correspond in a one-to-one manner with languages. Even so, many registered tags are intended to represent typographic conventions for a particular language. For cases in which a correlation exists between a tag and one or more languages, the language identities are documented here by reference to ISO 639-2 and ISO 639-3.

While many such correlations are documented, there is no claim to completeness, and given the complexity (and ever-evolving conventions) of human language and writing systems, it would be futile to expect it.

If information is available to an application declaring the language of text content, then the application may make use of that to select a default language system tag to be applied when displaying that text. It is preferable, however, to give users control over the choice of language system tag to be used. (Depending on the application scenario, such control may be given to content authors, to content readers, or to both.)

font-language-override exists precisely to give users control here, as recommended by the OpenType spec, recognizing that (a) it is impossible for a browser to correctly anticipate every mapping from language, as expressed in the HTML lang tag, to desired writing system conventions as expressed via OT language system; and (b) to require authors to artificially change the lang tag in order to access desired writing system conventions in a font would be actively harmful.

For example, the OT tag registry includes 5 different tags for Karen languages: BLK, KJP, KRN, KSW, PWO. An advanced Burmese font might support all 5 of these, with certain differences in glyphs and shaping behavior. However, there are many more than 5 languages and dialects within the Karen group, and in some cases writing conventions may not even be well-established or documented yet. An author should not have to mislabel content with the lang tag of one of the major Karen languages just to access their preferred rendering behavior. font-language-override allows content to be given an accurate lang tag, and separately allows the author to choose the desired rendering behavior when a font provides multiple options.

So I am opposed to dropping this. Yes, it's a niche use case, but it is a valid one; I strongly disagree with labelling it "undesirable".

@faceless2
Copy link

I don't know enough to make a case for it either way, but we have this working, so there will two shipping implementations at some point.

As a philosophical argument was raised, it's similar to the fallback requested for hyphenation in #5270. I think most of the arguments made here apply to that issue as well.

@svgeesus
Copy link
Contributor

svgeesus commented Sep 4, 2020

cc @r12a

@r12a
Copy link
Contributor

r12a commented Sep 8, 2020

I agree with @jfkthame. There must be plenty of minority or less-common languages which, labelled properly, would not trigger the shaping required from a font, whereas the OT tag could be used to indicate that "the rules appropriate to <some_other_language> work also for this language".

Here's another example. The Scheherazade font has the ability to turn this:

Screenshot 2020-09-08 at 16 00 32

into this, for Kurdish (exactly the same code points):

Screenshot 2020-09-08 at 16 00 45

Kurdish can be labelled using ku, but that's actually a macrolanguage in BCP-47 which groups together ckb (central kurdish), kmr (northern kurdish), and sdh (southern kurdish). That transformation will be applied if you label your content as lang="ckb" or lang="kmr", but not if it's labelled lang="sdh". Also, if your content is (and it may well be) labelled as lang=ku, that transformation will not be applied, either. However, if you use font-language-override: kur you will get the transformation, whatever language tag you use. (Note that kur is not a BCP-47 language tag, btw.)

So, basically, font-language-override is not a duplicate selector for language, it's a selector for a particular set of glyph transformations in a font, which happen to be grouped and labelled along linguistic lines (though not necessarily with BCP language tags), which can be applied if the appropriate lang attribute for the content doesn't produce the desired effect.

hth

@litherum
Copy link
Contributor Author

Okay. How about this related question, then:

What would have to happen in order to make it acceptable to remove font-language-override? Adding more flexibility to BCP language tags? Something else?

@kojiishi
Copy link
Contributor

I think we need a well-defined interoperable way to compute OpenType language system tags. Maybe in OpenType spec or in CLDR?

@litherum
Copy link
Contributor Author

This is coming up again, due to WebKit/WebKit#14837

A few new thoughts:

  1. Ideally it should be possible to achieve this via lang values like ja-Latn
  2. Almost all native text editors don't have an override just for shaping
  3. If lang isn't expressive enough, it seems like we should be going to either HTML or the IETF (who defines BCP47) to make it expressive enough
  4. font-language-override has a pretty bad fallback story; there is no guarantee the font that actually gets chosen to be used by the UA actually supports the value supplied. lang can and does affect font selection, though.
@litherum
Copy link
Contributor Author

litherum commented Jun 10, 2023

Looks like CLDR is adding support for OpenType language tags: https://unicode-org.atlassian.net/browse/CLDR-337 / MicrosoftDocs/typography-issues#1030

@jfkthame
Copy link
Contributor

Looks like CLDR is adding support for OpenType language tags: https://unicode-org.atlassian.net/browse/CLDR-337

The issue CLDR-337 appears to be closed as "out of scope", afaics.

@jfkthame
Copy link
Contributor

Maybe it's the name of the property that is the stumbling-block here, to some extent. Would it be better if it were called font-typographic-convention? The initial value would be auto, meaning to use behavior implied by the content language (or any other available clues), but it would also allowing an explicit choice of an OpenType "language system" tag (specified as a string).

A request for a non-auto typographic convention could be treated as an input to the font-matching algorithm, causing the UA to explicitly look for a font that supports the requested rendering. (This could of course equally well be done with font-language-override, though we haven't heard any call for this, afaik. The assumption has generally been that an author wanting this level of control would use it in conjunction with a specific webfont.)

@litherum
Copy link
Contributor Author

Would it be better if it were called font-typographic-convention?

I don't think so - typographic conventions are already applied by lang=.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
6 participants