ACATS 3.0 User's Guide
Annex E
Guidelines for Test Development
The guidelines used for developing recent ACATS tests
are summarized in this Annex. Developers of potential ACATS tests should
follow these guidelines closely. Tests that deviate extensively from
these guidelines are far less likely to be added to the ACATS than those
that follow them carefully.
Tests should follow
the test structure and organization of existing ACATS tests. Many details
of existing tests are described elsewhere in this document. Important
topics include:
Test layout and prologue (see
4.4);
Identifier and reserved word conventions
(see
4.5);
Library unit naming within tests
(see
4.3.3);
Executable test structure (see
4.6);
and
Indication of errors in B and L
tests (see
4.6).
The test number (character positions 6 and 7) should
be assigned letters rather than numbers by submitting test authors. Letters
that are likely to be unique (such as the author's initials) are preferred.
(Of course, different tests for the same clause with the same author
should have different names.) The ACAA Technical Agent will perform the
final naming of tests in order to ensure that the names are unique and
appropriate. Following this guideline reduces the chance of test conflicts
between authors.
If a single test file contains multiple compilation
units, they should be given in an order such that any dependent units
follow the units they depend on (so an implementation can process the
units sequentially in order). However, it should be assumed that all
units in a file will be presented to the compiler simultaneously. If
a test requires units to be presented in a specific order (as some separate
compilation tests do), the units with ordering requirements should be
in separate files, and the required order should be documented in the
test prologue.
When possible for B-Tests, only the last unit in
a file should contain errors (or even better, units with errors should
be in separate files from those without errors). This avoids penalizing
implementations that process units in a file sequentially and stop on
the first bad unit. (Multiple units in a single file are fairly rare
in real user code; the ACATS shouldn't require work in areas not useful
to typical users.) This guideline should be violated only when the number
of units with errors would be prohibitive to have in separate files.
(The maximum number of separate files in a test is 36 plus any foundations,
because the naming conventions for tests only leave a single character
for sequence numbers.)
C-Tests (especially those testing rules that are
not runtime checks) should be written in a usage-oriented style. That
means that the tests should reflect the way the features are typically
used in practice. Using a feature with no context is discouraged. For
instance, C-Tests for limited with clauses should use them to
declare mutually dependent types (the reason that limited with
clauses were added to the language) rather than just using them to replace
regular with clauses.
Tests should avoid the use of Text_IO (unless required
by the test objective). In particular, C-Tests should not create messages
with Text_IO; all messages should be generated via the subprograms in
the Report package. Messages in C-Tests should be written in mixed case,
not all UPPER CASE. Failure messages should be unique, so that the exact
failure can be pinpointed. This is often accomplished by including a
subtest identifier in the messages.
Tests can combine multiple objectives if a test
for a single objective is very short. However, the objectives should
be related, and no more than 4 objectives should be tested in a single
test (to avoid creating gigantic tests that are hard to understand and
use). Objectives from different clauses should never be combined, as
that makes it hard to find the associated test (it will necessarily be
filed in the wrong clause for one of the objectives).
When possible, tests should define (and thus share)
foundation code (see
4.1.4). Foundation packages
are a better alternative than creating large tests with many objectives
when the primary reason for combining the objectives is to avoid writing
set-up code multiple times. Foundation code is specific to tests for
a particular clause, however, so this technique cannot be used to combine
objectives from multiple clauses.
When a rule includes a term defined elsewhere,
testing of the rule should include testing all of the combinations implied
by the term. For instance, if we have a definition like "something
is either this, that, or fuzzy", then a test for a test objective
involving something should test cases where something is this and that
and fuzzy. If multiple layers of definitions make this impractical, then
a wide selection of combinations (as different as possible) should be
tried. The only exception to this rule is if separate tests of the definition
exist or should exist (that is, there is a test objective to test that
the definition is appropriately implemented).
When appropriate, tests should try a variety of
things. For instance, when testing subprograms, both procedures and functions
should be tested, with varying numbers of parameters, and with different
modes and types. Similarly, types should be more than just Integer –
tagged types, tasks, protected types, and anonymous access types should
be tried. However, adding variety should not be used as an excuse to
create multiple tests for an objective when one will do. That is, variety
is a secondary goal; exhaustive coverage of possibilities isn't needed
(unless the testing includes testing a term defined elsewhere, as described
in the previous guideline). Remember that the goal isn't to test combinations
of features; the point of using variety is to ensure that the objective
being tested works in more than just the simplest cases.
Tests should generally use only the 7-bit ASCII
characters. However, some tests will need to use other characters in
order to test Wide_Wide_Character support, Unicode characters in identifiers,
and the like. Such tests should be encoded in UTF-8 and start with a
UTF-8 Byte Order Mark. Tests should only use the code points that were
assigned in version 4.0 of the Unicode standard; when possible, using
only the roughly 680 characters generally available on US Windows systems
is recommended.
When constructing tests that check to see that
run-time checks are made, take special care that 11.6 permissions don't
render the test impotent. 11.6(5) allows language-defined checks to be
optimized away if the result of the operation is not used (even if the
exception is handled). That means it is critical that the values that
fail checks are used in some way afterwards (even though a correct program
will never execute that code). Failure to do that could allow a compiler
to optimize the entire test away, and that would require the test to
be corrected later.
When creating a B-Test for which different parts
test different errors, each error should identify the intended failure.
The standard error indication includes a colon; any needed text can follow
that colon. If the error identification will not fit on one line, place
it somewhere else and index to it. One common way to do this is to put
a list of intended errors into the header, labeling each with a letter.
Then each error comment can just identify the letter of the intended
error.
Tests that cover test objectives that are documented
as untested are especially welcome as test submissions. Tests that cover
previously tested objectives are less likely to be included in the test
suite.