3.5.2 Character Types
Static Semantics
1
An enumeration type is said to
be a
character type if at least one of its enumeration literals
is a
character_literal.
2/3
{
AI95-00285-01}
{
AI05-0181-1}
The
predefined type Character is a character type whose values correspond
to the 256 code positions of Row 00 (also known as Latin-1) of the
ISO/IEC
10646:2003 ISO 10646 Basic Multilingual
Plane (BMP). Each of the graphic characters of Row 00 of the BMP has
a corresponding
character_literal
in Character. Each of the nongraphic positions of Row 00
(0000-001F and 007F-009F) has a corresponding language-defined
name, which is not usable as an enumeration literal, but which is usable
with the attributes
Image, Wide_Image, Wide_Wide_Image,
Value, Wide_Value, and Wide_Wide_Value(Wide_)Image
and (Wide_)Value; these names are given in the definition of type
Character in
A.1, “
The
Package Standard”, but are set in
italics.
3/2
{
AI95-00285-01}
The
predefined type Wide_Character is a character type whose values correspond
to the 65536 code positions of the
ISO/IEC 10646:2003 ISO
10646 Basic Multilingual Plane (BMP). Each of the graphic characters
of the BMP has a corresponding
character_literal
in Wide_Character. The first 256 values of Wide_Character have the same
character_literal
or language-defined name as defined for Character.
Each
of the graphic_characters has The
last 2 values of Wide_Character correspond to the nongraphic positions
FFFE and FFFF of the BMP, and are assigned the language-defined names
FFFE and FFFF. As with the other language-defined names
for nongraphic characters, the names FFFE and FFFF are
usable only with the attributes (Wide_)Image and (Wide_)Value; they are
not usable as enumeration literals. All other values of Wide_Character
are considered graphic characters, and have a corresponding
character_literal.
3.1/2
{
AI95-00285-01}
The
predefined type Wide_Wide_Character is a character type whose values
correspond to the 2147483648 code positions of the ISO/IEC 10646:2003
character set. Each of the graphic_characters
has a corresponding character_literal
in Wide_Wide_Character. The first 65536 values of Wide_Wide_Character
have the same character_literal
or language-defined name as defined for Wide_Character.
3.2/2
{
AI95-00285-01}
The characters whose code position is larger than
16#FF# and which are not graphic_characters
have language-defined names which are formed by appending to the string
"Hex_" the representation of their code position in hexadecimal
as eight extended digits. As with other language-defined names, these
names are usable only with the attributes (Wide_)Wide_Image and (Wide_)Wide_Value;
they are not usable as enumeration literals.
3.a/2
Reason: {
AI95-00285-01}
The language-defined names are not usable as enumeration literals to
avoid "polluting" the name space. Since Wide_Character
and
Wide_Wide_Character are defined in Standard, if the
language-defined
names
FFFE and FFFF were usable as
enumeration literals, they would hide other nonoverloadable declarations
with the same names in
use-d packages.]}
3.b/2
{
AI95-00285-01}
ISO 10646 has not defined the meaning of all of
the code positions from 0100 through FFFD, but they are all considered
graphic characters by Ada to simplify the implementation, and to allow
for revisions to ISO 10646. In ISO 10646, FFFE and FFFF are special,
and will never be associated with graphic characters in any revision.
Implementation Permissions
4/2
This paragraph was
deleted.{
AI95-00285-01}
In a nonstandard mode, an implementation
may provide other interpretations for the predefined types Character
and Wide_Character[, to conform to local conventions].
Implementation Advice
5/2
This paragraph was
deleted.{
AI95-00285-01}
If an implementation supports
a mode with alternative interpretations for Character and Wide_Character,
the set of graphic characters of Character should nevertheless remain
a proper subset of the set of graphic characters of Wide_Character. Any
character set “localizations” should be reflected in the
results of the subprograms defined in the language-defined package Characters.Handling
(see A.3) available in such a mode. In a mode
with an alternative interpretation of Character, the implementation should
also support a corresponding change in what is a legal identifier_letter.
6
28 The language-defined library package
Characters.Latin_1 (see
A.3.3) includes the
declaration of constants denoting control characters, lower case characters,
and special characters of the predefined type Character.
6.a
To be honest: The package ASCII does
the same, but only for the first 128 characters of Character. Hence,
it is an obsolescent package, and we no longer mention it here.
7
29 A conventional character set such as
EBCDIC can be declared as a character type; the internal codes
of the characters can be specified by an
enumeration_representation_clause
as explained in clause
13.4.
Examples
8
Example of a character
type:
9
type Roman_Digit is ('I', 'V', 'X', 'L', 'C', 'D', 'M');
Inconsistencies With Ada 83
9.a
The declaration of Wide_Character
in package Standard hides use-visible declarations with the same defining
identifier. In the unlikely event that an Ada 83 program had depended
on such a use-visible declaration, and the program remains legal after
the substitution of Standard.Wide_Character, the meaning of the program
will be different.
Incompatibilities With Ada 83
9.b
The presence of Wide_Character
in package Standard means that an expression such as
9.c
'a' = 'b'
9.d
is ambiguous in Ada 95, whereas in Ada 83 both
literals could be resolved to be of type Character.
9.e
The change in visibility rules (see
4.2)
for character literals means that additional qualification might be necessary
to resolve expressions involving overloaded subprograms and character
literals.
Extensions to Ada 83
9.f
The type Character has been
extended to have 256 positions, and the type Wide_Character has been
added. Note that this change was already approved by the ARG for Ada
83 conforming compilers.
9.g
The rules for referencing character literals
are changed (see
4.2), so that the declaration
of the character type need not be directly visible to use its literals,
similar to
null and string literals. Context is used to resolve
their type.
Inconsistencies With Ada 95
9.h/2
{
AI95-00285-01}
Ada 95 defined most characters
in Wide_Character to be graphic characters, while Ada 2005 uses the categorizations
from ISO-10646:2003. It also provides language-defined names for all
non-graphic characters. That means that in Ada 2005, Wide_Character'Wide_Value
will raise Constraint_Error for a string representing a character_literal
of a non-graphic character, while Ada 95 would have accepted it. Similarly,
the result of Wide_Character'Wide_Image will change for such non-graphic
characters.
9.i/3
{
AI95-00395-01}
{
AI05-0005-1}
The language-defined names FFFE and FFFF were replaced
by a consistent set of language-defined names for all non-graphic characters
with positions greater than 16#FF#. That means that in Ada 2005, Wide_Character'Wide_Value("FFFE")
will raise Constraint_Error while Ada 95 would have accepted it. Similarly,
the result of Wide_Character'Wide_Image will change for the position
numbers 16#FFFE# and 16#FFFF#. It is very unlikely that this will matter
in practice, as these names do not represent usable useable characters.
9.j/2
{
AI95-00285-01}
{
AI95-00395-01}
Because of the previously mentioned changes to
the Wide_Character'Wide_Image of various character values, the value
of attribute Wide_Width will change for some subtypes of Wide_Character.
However, the new language-defined names were chosen so that the value
of Wide_Character'Wide_Width itself does not change.
9.k/2
{
AI95-00285-01}
The declaration of Wide_Wide_Character in package
Standard hides use-visible declarations with the same defining identifier.
In the (very) unlikely event that an Ada 95 program had depended on such
a use-visible declaration, and the program remains legal after the substitution
of Standard.Wide_Wide_Character, the meaning of the program will be different.
Extensions to Ada 95
9.l/2
Wording Changes from Ada 95
9.m/2
{
AI95-00285-01}
Characters are now defined in terms of the entire
ISO/IEC 10646:2003 character set.
9.n/3
{
AI95-00285-01}
{
AI05-0248-1}
We dropped the Implementation Advice for nonstandard non-standard interpretation of character sets; an implementation can do what it wants
in a nonstandard non-standard mode, so there isn't much point to any advice.
Wording Changes from Ada 2005
9.o/3
{
AI05-0181-1}
Correction: Removed the position numbers
of nongraphic characters from the text, as it is wrong and thus misleading.
Ada 2005 and 2012 Editions sponsored in part by Ada-Europe