Guidelines for Test Development

The guidelines used for developing recent ACATS tests are summarized in this Annex. Developers of potential ACATS tests should follow these guidelines closely. Tests that deviate extensively from these guidelines are far less likely to be added to the ACATS than those that follow them carefully.

Tests should follow the test structure and organization of existing ACATS tests. Many details of existing tests are described elsewhere in this document. Important topics include:

The test number (character positions 6 and 7) should be assigned letters rather than numbers by submitting test authors. Letters that are likely to be unique (such as the author's initials) are preferred. (Of course, different tests for the same clause with the same author should have different names.) The ACAA Technical Agent will perform the final naming of tests in order to ensure that the names are unique and appropriate. Following this guideline reduces the chance of test conflicts between authors.

If a single test file contains multiple compilation units, they should be given in an order such that any dependent units follow the units they depend on (so an implementation can process the units sequentially in order). However, it should be assumed that all units in a file will be presented to the compiler simultaneously. If a test requires units to be presented in a specific order (as some separate compilation tests do), the units with ordering requirements should be in separate files, and the required order should be documented in the test prologue.

When possible for B-Tests, only the last unit in a file should contain errors (or even better, units with errors should be in separate files from those without errors). This avoids penalizing implementations that process units in a file sequentially and stop on the first bad unit. (Multiple units in a single file are fairly rare in real user code; the ACATS shouldn't require work in areas not useful to typical users.) This guideline should be violated only when the number of units with errors would be prohibitive to have in separate files. (The maximum number of separate files in a test is 36 plus any foundations, because the naming conventions for tests only leave a single character for sequence numbers.)

C-Tests (especially those testing rules that are not runtime checks) should be written in a usage-oriented style. That means that the tests should reflect the way the features are typically used in practice. Using a feature with no context is discouraged. For instance, C-Tests for limited with clauses should use them to declare mutually dependent types (the reason that limited with clauses were added to the language) rather than just using them to replace regular with clauses.

Tests should avoid the use of Text_IO (unless required by the test objective). In particular, C-Tests should not create messages with Text_IO; all messages should be generated via the subprograms in the Report package. Messages in C-Tests should be written in mixed case, not all UPPER CASE. Failure messages should be unique, so that the exact failure can be pinpointed. This is often accomplished by including a subtest identifier in the messages.

Tests can combine multiple objectives if a test for a single objective is very short. However, the objectives should be related, and no more than 4 objectives should be tested in a single test (to avoid creating gigantic tests that are hard to understand and use). Objectives from different clauses should never be combined, as that makes it hard to find the associated test (it will necessarily be filed in the wrong clause for one of the objectives).

When possible, tests should define (and thus share) foundation code (see 4.1.4). Foundation packages are a better alternative than creating large tests with many objectives when the primary reason for combining the objectives is to avoid writing set-up code multiple times. Foundation code is specific to tests for a particular clause, however, so this technique cannot be used to combine objectives from multiple clauses.

When a rule includes a term defined elsewhere, testing of the rule should include testing all of the combinations implied by the term. For instance, if we have a definition like "something is either this, that, or fuzzy", then a test for a test objective involving something should test cases where something is this and that and fuzzy. If multiple layers of definitions make this impractical, then a wide selection of combinations (as different as possible) should be tried. The only exception to this rule is if separate tests of the definition exist or should exist (that is, there is a test objective to test that the definition is appropriately implemented).

When appropriate, tests should try a variety of things. For instance, when testing subprograms, both procedures and functions should be tested, with varying numbers of parameters, and with different modes and types. Similarly, types should be more than just Integer – tagged types, tasks, protected types, and anonymous access types should be tried. However, adding variety should not be used as an excuse to create multiple tests for an objective when one will do. That is, variety is a secondary goal; exhaustive coverage of possibilities isn't needed (unless the testing includes testing a term defined elsewhere, as described in the previous guideline). Remember that the goal isn't to test combinations of features; the point of using variety is to ensure that the objective being tested works in more than just the simplest cases.

Tests should generally use only the 7-bit ASCII characters. However, some tests will need to use other characters in order to test Wide_Wide_Character support, Unicode characters in identifiers, and the like. Such tests should be encoded in UTF-8 and start with a UTF-8 Byte Order Mark. Tests should only use the code points that were assigned in version 4.0 of the Unicode standard; when possible, using only the roughly 680 characters generally available on US Windows systems is recommended.

When constructing tests that check to see that run-time checks are made, take special care that 11.6 permissions don't render the test impotent. 11.6(5) allows language-defined checks to be optimized away if the result of the operation is not used (even if the exception is handled). That means it is critical that the values that fail checks are used in some way afterwards (even though a correct program will never execute that code). Failure to do that could allow a compiler to optimize the entire test away, and that would require the test to be corrected later.

When creating a B-Test for which different parts test different errors, each error should identify the intended failure. The standard error indication includes a colon; any needed text can follow that colon. If the error identification will not fit on one line, place it somewhere else and index to it. One common way to do this is to put a list of intended errors into the header, labeling each with a letter. Then each error comment can just identify the letter of the intended error.

Tests that cover test objectives that are documented as untested are especially welcome as test submissions. Tests that cover previously tested objectives are less likely to be included in the test suite.

Annex E Guidelines for Test Development

Annex E
Guidelines for Test Development