meta — Metadata Table (OpenType 1.8.1)
The metadata table contains various metadata values for the font. Different categories of metadata are identified by four-character tags. Values for different categories can be either binary or text.
Table Formats
The metadata table begins with a header, structured as follows.
Metadata header:
Type | Name | Description |
---|---|---|
uint16 | majorVersion | Major version number of the metadata table — set to 1. |
uint16 | minorVersion | Minor version number of the metadata table — set to 0. |
uint32 | flags | Flags — currently unused; set to 0. |
uint32 | (reserved) | Not used; set to 0. |
uint32 | dataMapsCount | The number of data maps in the table. |
DataMap | dataMaps[dataMapsCount] | Array of data map records. |
The data map record has the following format.
DataMap record:
Type | Name | Description |
---|---|---|
Tag | tag | A tag indicating the type of metadata. |
Offset32 | dataOffset | Offset in bytes from the beginning of the metadata table to the data for this tag. |
uint32 | dataLength | Length of the data, in bytes. The data is not required to be padded to any byte boundary. |
The data for a given record may be either textual or binary. The representation format is specified for each tag. Depending on the tag, multiple records for a given tag or multiple, delimited values in a record may or may not be permitted, as specified for each tag. If only one record or value is permitted for a tag, then any instances after the first may be ignored.
Metadata Tags
Metadata tags identify the category of information provided and representation format used for a given metadata value. A registry of commonly-used tags is maintained, but private, vendor-determined tags can also be used.
Like other OpenType tags, metadata tags are four unsigned bytes that can equivalently be interpreted as a string of four ASCII characters. Metadata tags must begin with a letter (0x41 to 0x5A, 0x61 to 0x7A) and must use only letters, digits (0x30 to 0x39) or space (0x20). Space characters must only occur as trailing characters in tags that have fewer than four letters or digits.
Privately-defined axis tags must begin with an uppercase letter (0x41 to 0x5A), and must use only uppercase letters or digits. Registered axis tags must not use that pattern, but can be any other valid pattern.
Every registered tag defines the semantics of the associated metadata values, and the representation format of those values. Values for registered tags may be may be either textual or binary. If textual, it will be in UTF-8 encoding unless explicitly indicated otherwise.
The following registered tags are defined or reserved at this time:
Tag | Name | Format | Description |
---|---|---|---|
appl | (reserved) | Reserved — used by Apple. | |
bild | (reserved) | Reserved — used by Apple. | |
dlng | Design languages | Text, using only Basic Latin (ASCII) characters. | Indicates languages and/or scripts for the user audiences that the font was primarily designed for. Only one instance is used. See below for additional details. |
slng | Supported languages | Text, using only Basic Latin (ASCII) characters. | Indicates languages and/or scripts that the font is declared to be capable of supporting. Only one instance See below for additional details. |
The values for 'dlng' and 'slng' are comprised of a series of comma-separated ScriptLangTags, which are described in detail below. Spaces may follow the comma delimiters and are ignored. Each ScriptLangTag identifies a language or script. A list of tags is interpreted to imply that all of the languages or scripts are included.
The 'dlng' value is used to indicate the languages or scripts of the primary user audiences for which the font was designed. This value may be useful for selecting default font formatting based on content language, for presenting filtered font options based on user language preferences, or similar applications involving the language or script of content or user settings.
The 'slng' value is used to declare languages or scripts that the font is capable of supported. This value may be useful for font fallback mechanisms or other applications involving the language or script of content or user settings.
Note: Implementations that use 'slng' values in a font may choose to ignore Unicode-range bits set in the OS/2 table.
Some examples will help to understand the distinction between design and supported languages:
- Consider the case of accented Latin letters: Although the accents are used in common by a number of languages, the precise shape of the accents can depend on the typographic traditions of a specific language. Polish, for example, prefers steeper accents than French. A font that was designed with accents specifically for Polish would declare Polish as a design language, but might declare support for any language using Latin script.
- Fonts designed for East Asian markets will generally include glyphs for Latin, Greek and Cyrillic because these characters are included in important East Asian character set standards, but using East Asian fonts for languages that are written with those scripts is generally unsatisfactory. Such fonts would therefore include these scripts in the 'slng' value, but not in their 'dlng' value.
- There are some systematic differences in glyph design for the characters shared by simplified and traditional Chinese, such as the way the “bone” radical is drawn in all characters using it. A font specifically designed for use with simplified Chinese can usually be used to display traditional Chinese, but any character with the “bone” radical will look wrong to readers of traditional Chinese. Such a font would include simplified Chinese 'dlng' value, but both simplified and traditional Chinese in its 'slng' value.
ScriptLangTag Values
The 'dlng' and 'slng' metadata use ScriptLangTag values, defined here.
A ScriptLangTag denotes a particular language or script associated with a font. These are adapted from the IETF BCP 47 specification, “Tags for Identifying Languages” (see http://tools.ietf.org/html/bcp47).
BCP 47 language tags can include various subtags that provide different types of qualifiers, such as language, script or region. In a BCP 47 language tag, a language subtag element is mandatory and other subtags are optional. ScriptLangTag values used for 'dlng' and 'slng' metadata values use a modification of the BCP 47 syntax: a script subtag is mandatory and other subtags are optional. The following augmented BNF syntax, adapted from BCP 47, is used:
ScriptLangTag = [language "-"] script ["-" region] *("-" variant) *("-" extension) ["-" privateuse]
The expansion of the elements and the intended semantics associated with each are as defined in BCP 47. Script subtags are taken from ISO 15924. At present, no extensions are defined for use in ScriptLangTags, and any extension will be ignored. Private-use elements, which are prefixed with “-x”, are defined by private agreement between the source and recipient and may be ignored.
Subtags must be valid for use in BCP 47 and contained in the Language Subtag Registry maintained by IANA. See http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry and section 3 of BCP 47 for details.
Note: OpenType Layout script and language system tags are not the same as those used in BCP 47 and should not be referenced when creating or processing ScriptLangTags.
Any ScriptLangTag value not conforming to these specifications is ignored.
A ScriptLangTag can denote fairly specific information; for example, “en-Latn-IN” would represent ‘Latin script as used for the English language in India’. In most cases, however, generic tags should be used, and it is anticipated that most tags used in 'dlng' and 'slng' metadata declarations will consist only of a script subtag. Language or other subtags can be included, however, and may be appropriate in some cases. Implementations must allow for ScriptLangTags that include additional subtags, but they may also choose to interpret only the script subtag and ignore other subtags.
Examples:
- “Latn” denotes Latin script (and any language or writing system using Latin script).
- “Cyrl” denotes Cyrillic script.
- “sr-Cyrl” denotes Cyrillic script as used for writing the Serbian language; a font that has this property value may not be suitable for displaying text in Russian or other languages written using Cyrillic script.
- “en-Dsrt” denotes English written with the Deseret script.
- “Hant” denotes Traditional Chinese.
- “Hant-HK” denotes Traditional Chinese as used in China.
- “Jpan” denotes Japanese writing — ISO 15924 defines “Jpan” as an alias for Han + Hiragana + Katakana.
- “Kore” denotes Korean writing — ISO 15924 defines “Kore” as an alias for Hangul + Han.
- “Hang” denotes Hangul script (exclusively — Hanja are not implied by “Hang”).
The Unicode Standard uses the ISO 15924 identifiers “Zinh” (‘inherited’) and “Zyyy” (‘undetermined’). These should not be used in ScriptLangTags. Similarly, “Zxxx” (‘unwritten document’) and “Zzzz” (‘unencoded script’) should never be used.
On the other hand, “Zmth” (‘Mathematical notation’) and “Zsym” (“Symbols”) are not used in the Unicode Standard, yet they may be very useful as declarations in font files. (They were, in fact, added to ISO 15924 for use in relation to fonts.)
In relation to East Asian scripts, a declaration of “Jpan” can be used to cover hiragana, katakana and kanji. Similarly, “Kore” can be used to cover Hangul and hanja, though a Korean font with only Hangul support should use “Hang”. For Chinese fonts, “Hans” and “Hant” should normally be used to distinguish between Simplified and Traditional orthographies rather than the more generic declaration “Hani”. Region-specific variations such as “Hant-HK” can also be declared. In some cases, it may be appropriate to describe a font capability (but probably not design target) using the generic declaration “Hani” (denoting ‘Han / Hanzi / Kanji / Hanja’).
The BCP 47 specification for region subtags allows for continental and sub-continental regions. For example, “039” can be used to denote Southern Europe. Use of such extended-region subtags in ScriptLangTag values is not recommended as software implementations may not have the logic to make appropriate correlations to more specific regions or languages associated with those regions.