Contents Index Search Previous Next
2.1 Character Set
1
{character set}
The only characters allowed outside of
comments
are the
graphic_characters and
format_effectors.
1.a
Ramification: Any character,
including an other_control_function,
is allowed in a comment.
1.b
Note that this rule doesn't
really have much force, since the implementation can represent characters
in the source in any way it sees fit. For example, an implementation
could simply define that what seems to be a non-graphic, non-format-effector
character is actually a representation of the space character.
1.c
Discussion: It is our
intent to follow the terminology of ISO 10646 BMP where appropriate,
and to remain compatible with the character classifications defined in
A.3, ``Character Handling''.
Note that our definition for graphic_character
is more inclusive than that of ISO 10646-1.
Syntax
2
character
::= graphic_character |
format_effector |
other_control_function
3
graphic_character
::= identifier_letter |
digit |
space_character |
special_character
Static Semantics
4
The character repertoire for the text of an Ada
program consists of the collection of characters called the Basic Multilingual
Plane (BMP) of the ISO 10646 Universal Multiple-Octet Coded Character
Set, plus a set of format_effectors
and, in comments only, a set of other_control_functions;
the coded representation for these characters is implementation defined
[(it need not be a representation defined within ISO-10646-1)].
4.a
Implementation defined: The
coded representation for the text of an Ada program.
5
The description of the language definition in
this International Standard uses the graphic symbols defined for Row
00: Basic Latin and Row 00: Latin-1 Supplement of the ISO 10646 BMP;
these correspond to the graphic symbols of ISO 8859-1 (Latin-1); no graphic
symbols are used in this International Standard for characters outside
of Row 00 of the BMP. The actual set of graphic symbols used by an implementation
for the visual representation of the text of an Ada program is not specified.
{unspecified [partial]}
6
The categories of
characters are defined as follows:
7
- {identifier_letter}
identifier_letter
-
upper_case_identifier_letter | lower_case_identifier_letter
7.a
Discussion: We use identifier_letter
instead of simply letter because
ISO 10646 BMP includes many other characters that would generally be
considered "letters."
8
- {upper_case_identifier_letter}
upper_case_identifier_letter
-
Any character of Row 00 of ISO 10646 BMP whose name begins ``Latin Capital
Letter''.
9
- {lower_case_identifier_letter}
lower_case_identifier_letter
-
Any character of Row 00 of ISO 10646 BMP whose name begins ``Latin Small
Letter''.
9.a/1
This paragraph
was deleted.To be honest: {8652/0001}
The above rules do not include the ligatures Æ and æ.
However, the intent is to include these characters as identifier letters.
This problem was pointed out by a comment from the Netherlands.
10
- {digit} digit
-
One of the characters 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9.
11
- {space_character}
space_character
-
The character of ISO 10646 BMP named ``Space''.
12
- {special_character}
special_character
-
Any character of the ISO 10646 BMP that is not reserved for a control
function, and is not the space_character,
an identifier_letter, or a digit.
12.a
Ramification: Note that
the no break space and soft hyphen are special_characters,
and therefore graphic_characters.
They are not the same characters as space and hyphen-minus.
13
- {format_effector}
format_effector
-
The control functions of ISO 6429 called character tabulation (HT), line
tabulation (VT), carriage return (CR), line feed (LF), and form feed
(FF). {control character: See also format_effector}
14
- {other_control_function}
other_control_function
-
Any control function, other than a format_effector,
that is allowed in a comment; the set of other_control_functions
allowed in comments is implementation defined. {control
character: See also other_control_function}
14.a
Implementation defined: The
control functions allowed in comments.
15
{names
of special_characters} {special_character
(names)} The following names are used
when referring to certain
special_characters:
{quotation mark} {number
sign} {ampersand}
{apostrophe} {tick}
{left parenthesis}
{right parenthesis}
{asterisk} {multiply}
{plus sign} {comma}
{hyphen-minus} {minus}
{full stop} {dot}
{point} {solidus}
{divide} {colon}
{semicolon} {less-than
sign} {equals sign}
{greater-than sign}
{low line} {underline}
{vertical line} {left
square bracket} {right
square bracket} {left
curly bracket} {right
curly bracket}
15.a
Discussion: These are
the ones that play a special role in the syntax of Ada 95, or in the
syntax rules; we don't bother to define names for all characters. The
first name given is the name from ISO 10646-1; the subsequent names,
if any, are those used within the standard, depending on context.
symbol | name | symbol | name |
|
| | | |
|
" | quotation mark | : | colon |
|
# | number sign | ; | semicolon |
|
& | ampersand | < | less-than sign |
|
' | apostrophe, tick | = | equals sign |
|
( | left parenthesis | > | greater-than sign |
|
) | right parenthesis | _ | low line, underline |
|
* | asterisk, multiply | | | vertical line |
|
+ | plus sign | [ | left square bracket |
|
, | comma | ] | right square bracket |
|
- | hyphen-minus, minus | { | left curly bracket |
|
. | full stop, dot, point | } | right curly bracket |
|
/ | solidus, divide | | |
|
Implementation Permissions
16
In a nonstandard mode, the implementation may
support a different character repertoire[; in particular, the set of
characters that are considered identifier_letters
can be extended or changed to conform to local conventions].
16.a
Ramification: If an implementation
supports other character sets, it defines which characters fall into
each category, such as ``identifier_letter,''
and what the corresponding rules of this section are, such as which characters
are allowed in the text of a program.
17
1 Every code position of
ISO 10646 BMP that is not reserved for a control function is defined
to be a graphic_character by this
International Standard. This includes all code positions other than 0000
- 001F, 007F - 009F, and FFFE - FFFF.
18
2 The language does not
specify the source representation of programs.
18.a
Discussion: Any source
representation is valid so long as the implementer can produce an (information-preserving)
algorithm for translating both directions between the representation
and the standard character set. (For example, every character in the
standard character set has to be representable, even if the output devices
attached to a given computer cannot print all of those characters properly.)
From a practical point of view, every implementer will have to provide
some way to process the ACVC. It is the intent to allow source representations,
such as parse trees, that are not even linear sequences of characters.
It is also the intent to allow different fonts: reserved words might
be in bold face, and that should be irrelevant to the semantics.
Extensions to Ada 83
18.b
{extensions to Ada 83}
Ada 95 allows 8-bit and 16-bit characters, as well
as implementation-specified character sets.
Wording Changes from Ada 83
18.c
The syntax rules in this clause
are modified to remove the emphasis on basic characters vs. others. (In
this day and age, there is no need to point out that you can write programs
without using (for example) lower case letters.) In particular, character
(representing all characters usable outside comments) is added, and basic_graphic_character,
other_special_character, and basic_character
are removed. Special_character is
expanded to include Ada 83's other_special_character,
as well as new 8-bit characters not present in Ada 83. Note that the
term ``basic letter'' is used in A.3, ``Character
Handling'' to refer to letters without diacritical marks.
18.d
Character names now come from
ISO 10646.
18.e
We use identifier_letter
rather than letter since ISO 10646
BMP includes many "letters' that are not permitted in identifiers
(in the standard mode).
Contents Index Search Previous Next Legal