The sequoia testing framework is designed for scale and is platform-independent, highly automated, robust, extensible, malleable, semantics-aware, allocator-aware, generics-friendly and expressive.

Scalability

To operate effectively at scale, sequoia incorporates the following features:

No waste: if a principled structure is adhered to, then running with the prune option will ensure that only those tests dependent on changes since the last run will be executed. Changes may be either to source files or materials exploited by the test. In particular:
1. cpp files should supply definitions only for the header of the same name;
2. Given a test file Tests/Foo/Bar.cpp, any testing materials should be stored in TestMaterials/Foo/Bar.
Concurrency: by default, all tests (bar performance tests) are run concurrently. Where supported, this is done using a std algorithm invoked with the parallel execution policy. For libc++, a thread pool supplied by sequoia is utilized instead. This may be employed on any platform by running with --thread-pool <desired num threads>.
Instability detection: locate-instabilities is called with an integer, specifying the number of times tests should be run. Instabilities are pinned down to the line of test code where they first manifest. By default, everything is run within the same program, allowing detection of changes to static data. However, it's conceivable that these changes are desired. If appropriate, use the --sandbox flag to ensure that each run is done in an independent process.

Platform Independence

Having cloned the sequoia repository, the testing framework may be built with CMake. A good way to get started is to navigate to the TestFrameworkDiagnostics folder and invoke CMake with the appropriate generator. This has been tested with MSVC, clang and gcc. Appropriate commands may look something like

cmake -G "Visual Studio 17 2022" -B ..\build\CMade\win\TestFrameworkDiagnostics

or

cmake -D CMAKE_CXX_COMPILER=/usr/local/opt/llvm/bin/clang++ -B ../build/CMade/clang/TestFrameworkDiagnostics

depending on the system and precise set-up. Invoking the resulting executable runs a minimal set of diagnostics. A richer set may be found in TestAll.

As described in more detail below, certain output arising from running tests is written to files which should be placed under version control. This presents additional difficulties in guaranteeing platform independence. For example, the files may contain absolute paths or demangled type names, which could differ from machine to machine and/or platform to platform. For files placed under version control this is particularly problematic. Therefore, various mechanisms have been put in place to mitigate these problems: see Robustness, Malleability and Generics-Friendliness.

Automation

The processes of initializing a new project, and creating new tests are highly automated. Both are integrated with CMake. For example, running with the command line arguments

    init "Jo Bloggs" "Some/Absolute/Path" "\t"

will do the following:

Create a new project in the specified location, with tabs preferred to spaces, and with the copyright of any new material created though mechanisms described below belonging to Jo Bloggs.
Invoke CMake to build the new project.
Initialize a new git repository and add the newly created project files.

Additionally, on Windows:

Visual Studio will be launched, with the appropriate start-up project set.
The new project's (empty) test suite will be run.

Once a new project has been created, it is natural to add code and tests. This can all be done automatically by utilizing the create mode which will add the appropriate files, amend the CMakeLists.txt files and run CMake.

Robustness

The testing framework is sophisticated and sophistication brings danger, since it raises the chances that the framework itself has bugs. If this is the case, then the framework would give clients a false sense of security about the code they have written, which could be disastrous. To mitigate this risk, the testing framework has been designed to run self-diagnostics. This has been employed to give a high degree of confidence that the existing framework is correct. Used idiomatically, it should confer similar confidence in any extensions.

One of the defining features of the testing framework is that it is designed to expose false-negatives. As such, each test operates in a particular test_mode, chosen at compile time: standard, false_negative and false_positive.

In standard mode, the test framework operates as one might expect. A typical check looks as follows:

check(equality, "Description of test", x, 5);

If x==5 the check passes, whereas if x!=5 a failure is reported. However, suppose that check has a bug. For example, it might never report failure. To counter this, tests can be created to be run in false-negative mode. In this case, the above example will pass when x!=5 and fail when x==5. The purpose of this is to pick up bugs in the testing framework itself: false negatives.

In standard mode, when a test fails, details of the failure will be given directly to the client. In the above example, for the case where x==4, something like this will be seen:

Obtained : 4
Predicted: 5

In false-negative mode, this output is not made directly visible to the user, since the false-negative test has succeeded. Instead, it is dumped to a file. This means that clients can check whether the underlying failure which the false-negative check has detected is as expected. It is good practice to place this file under version control. This provides sensitivity both to changes in the false-negative test and also changes to the way in which the testing framework generates output.

Clients may extend the testing framework to conveniently test their types, for example by specializing the value_tester (see Extensibility). This is a perfect opportunity to write some false-negative tests to give confidence that the newly-added code is not spuriously reporting success.

Finally, there are false-positive tests. They are essentially the same as standard mode tests; however, they are treated separately since statements like

check(equality, "Description of test", 5, 5);

are morally tests of the testing framework itself, rather than tests of client code. As with false-negative tests, output is dumped to an auxiliary file, primarily as a means of detecting (via version control) changes to the way the testing framework generates output.

Just as the outputs of false-positive and, especially, false-negative tests is written to files, so to is the output from exception checks. For example, consider the call

check_exception_thrown<std::runtime_error>("Description", f);

where f is invocable without any arguments. This check will pass if f throws an exception of the expected type - here, std::runtime_error. To obtain higher fidelity, the message associated with the exception is written to a file, again best placed under version control. First and foremost, this allows clients to check that the exception thrown is the one expected. After all, it may be that f can fail in a number of says, each with the same exception type. Only by checking the associated message can one be sure that the failure is the one intended to be tested. Secondly, by placing the file under version control, changes to either the text of the exception, or changes to which exception was actually thrown can be detected.

There are cases where an exception message may differ between platforms. For example, consider the case where an exception is thrown reporting that a file, specified by an absolute path, does not exist. Since this is written to a file placed under version control, without mitigation running the associated test on different platforms will generate unwanted diffs. To mitigate this, check_exception_thrown comes with an optional final argument: a function object which post-processes the exception message, prior to writing to disk.

Should the need arise, the class template exception_message_extractor is provided to allow customized extraction of error messages from exceptions of arbitrary types.

Extensibility

There are several ways in which the testing framework may be extended. Suppose that a client has created a new type, my_type. One way or another, it is almost inevitable that calls will be made to check. If the type is sufficiently simple, and defines operator==, it suffices for clients to specialize the class template serializer. This defines how to serialize instances of the class and so we may again end up with a failure report along the lines of the above. However, if the class is more complex, there is a superior alternative.

sequoia defines a class template, value_tester. One purpose of this is to provide a customization point where the const accessors of two instances of a class can be used to probe purported equality. Suppose that a client has created a new container and consider comparing two instances. operator== may return false or true. In the first case, we want to get to the bottom of this, in a more appropriate manner than simply serializing the class (which may produce excessive output). For a container, it would make sense compare the size and then to use iterators to compare the elements and find any that differ. However, it is worth doing this even if operator== returns true; after all, the aim is to test with high fidelity and it may be that operator== has a bug; or perhaps it is fine but there's a bug in the const accessors. Either way, a well designed definition of the static method

void value_tester<my_type>::check(equality_check_t, std::string_view description, const my_type& obtained, const my_type& prediction);

will catch these potential inconsistencies.

However, there is a gap here. Consider the the first time a type is instantiated in a test, say

my_type x{5};

It is desirable to check that x is correctly initialized, but circular to write something like

check(equality, "x correctly initialized", x, my_type{5});

Therefore, in addition to supplying a static check(equality_check_t, ...) method, we can supply an overload, check(equivalence_check_t, ...).Suppose that, in this example, my_type simply wraps an int. In this case we would define the static method

void value_tester<my_type>::check(equivalence_check_t, std::string_view description, const my_type& obtained, int prediction);

which will automatically be called when the following line of code is invoked:

check(equivalence, "x correctly initialized", x, 5);

There are circumstances in which it appropriate to consider equivalence of a type not with another type, but with itself. For example, consider std::filesystem::path. Equality of two instances would amount to the paths being identical. One reasonable definition of two equivalent paths is that they point to a filesystem object with the same name and the same contents, but potentially in different locations. This is the convention used by sequoia.

In some circumstances, it may feel more logically comfortable check weak equivalence instead of equivalence (done in the obvious way, adhering to the pattern above). For example, suppose the goal is to compare two values of std::fuction<R (Args...)>. Statically, it is known that the signatures are the same. However, at runtime, all that can be readily checked is whether a given std::function is null or not. Therefore, it may be reasonable to say that two instances of std::fuction<R (Args...)> are weakly equivalent if they are either both null or both not null.

There is also an entirely different way in which the testing framework may be extended. The checker class template provides basic functionality such as check(equality_check_t, ...) and check_exception_thrown. However, the class template accepts a variadic number of extenders, which enhance its functionality. Examples provided with sequoia are various semantics extenders (see below), and a performance_extender. Extenders may be readily mixed and matched with appropriate using declarations, some of which are supplied with the library.

So far, the customization discussed is static, or type-based. Generally, wherever this is possible, it is to be preferred. However, the framework does support dynamic customization, which can be understood by the following example. Recall that equivalence of instances of std::filesystem::path is taken to mean that two filesystem objects, potentially in different locations, have the same name and the same contents. By default, the contents of a file are determined via de-serialization to a string. However, this may not be appropriate for all file types. Moreover, the particular file types involved in comparisons will only be known at runtime and so are dynamic, as opposed to static. With this in mind, the testing framework supports generalized (weak) equivalence checks. Statically, these facilitate the injection of an arbitrary type which is fed through to the value_tester, providing dynamic customizations.

Semantics-Awareness

By utilizing sequoia's testing framework, clients are strongly encouraged to think carefully about the semantics of their classes upfront, rather than as an after-thought. For the purposes of this library, a type exhibiting regular semantics is understood to exhibit copy/move constructors, copy/move assignment, operator==, operator!= and swap. Note that there is no strict requirement for a default constructor. If a class provides this functionality, then the regular_test alias template may be utilized, which makes use of the regular_extender class template. The latter provides an overload of check_semantics: given two instantiations of a class it checks consistency of the above list of special member functions / functions. Alternatively, for move-only types (defined as regular types but for which the copy operations are removed), the move_only_test may be exploited. Either way, this removes much of the burden of devising ways to carefully check consistency of these operations by reliably bundling everything into a call to check_semantics.

As a bonus, if both serialization and de-serialization are defined, check_semantics will check their consistency. Not a single line of extra code need to written to activate the extra check: static reflection determines whether or not it should be performed.

Malleability

For some tests it is desirable to utilize auxiliary materials, as alluded to in Scalability. Given a test file Tests/Foo/Bar.cpp, any testing materials should be stored in TestMaterials/Foo/Bar, using the following conventions. First, if the materials are simply for consumption by the test code, the materials can simply be placed in the aforementioned directory. However, it may be that it is desirable to perform a comparison with materials stored on disk. In this case, there are three sub-folders with special names: Auxiliary, WorkingCopy and Prediction, of which only the last is necessary. When a test is run that possesses a Prediction folder, the Auxiliary and WorkingCopy folders, should they exist, are copied to a temporary location which is cleared out at the beginning of each test run. If WorkingCopy does not exist, it is created on the fly. At some point during the test, the contents of WorkingCopy should be compared with Prediction; the test framework comes with methods exposing the relevant paths. The Auxiliary folder is useful for holding materials never intended for comparison. Roughly speaking, differences between the WorkingCopy and Prediction folders and any files they may hold will be reported as a test failure.

However, there are some exceptions to this. First, the following files are ignored: .keep and .DS_Store. More interestingly, clients may specify that particular substrings within a file are ignored when comparing file contents. To motivate this, consider end-to-end testing of the testing framework itself. The output includes timings, of the form [2ms]. The problem is that timings can be expected to change, meaning that a given test may regularly fail in way which is not useful. To counter this, each file in Prediction and it subfolders can be supplemented by a file of the same name but with the extension seqpat. The latter is expected to contain a list of regular expressions, specifying sub-strings to be ignored from the file comparison process.

Suppose now that we are left we a genuine failure after preprocessing the files in WorkingCopy and Prediction. It may be that the code under test is actually correct, but the predictions need updating. This can be achieved by running the test with the additional argument update. Updating is discerning: only files exhibiting a failure after preprocessing are updated. This prevents the version-controlled diffs being polluted with noise.

Allocator-Awareness

The C++17 allocator framework is powerful but complex. Much of this complexity derives from the intersection of the logical abstraction containers seek to represent with the realities of creating efficient code. Consider std::vector. This models dynamic, contiguous storage: the logical content of this container is its elements. Indeed, operator== checks that two vectors are of the same size and that its elements are equal, but no more than this. However, vectors additionally comprise allocators which are not part of the logical abstraction. In principle, the allocator maybe stateful and this raises interesting questions. Should operator== take account of this state? The design of std::vector gives a definitive answer: No. But what should be done when doing assignment? Should the state of the allocator propagate or not. Should the choice be consistent between copy assignment and move assignment? What about swapping? There are no definitive answers to these questions and so this is left in the hands of the client. Indeed, std::allocator_traits exposes various type definitions, reflecting this freedom.

With flexibility of the C++ allocation model comes commensurate difficulty when it comes to testing. To help with this, the sequoia testing framework has been built with allocators firmly in mind. An allocator-aware class with regular semantics may be rigorously tested by using a regular_allocation_extender; for the move-only case move_only_allocation_extender is available. These checkers provides an overload of the check_semantics method which accepts allocation predictions. In the regular case, the latter correspond to copying, assigning (with and without propagation) and mutation, together with para-copy/move. Here and throught the documentation para copy/move refer to the copy-like and move-like constructors which additionally accept an allocator. The design accommodates scoped allocators and multiple allocators (be each of them normal or scoped). Regular allocation tests come with a facility for automatically generating all 8 combinations of allocation propagation from a single call to check_semantics. Move-only allocation tests do likewise for the 4 combinations relevant to move-only types.

An additional subtlety arises from attempting to ensure independence on library implementations and build settings. This is sharpened by the fact that certain operations on the standard containers perform additional allocations in an MSVC debug build, compared to a release build. However, the framework is flexible enough to deal with this. Indeed, if the semantics are such that the container under inspection behaves like a container of values, then the framework automatically shifts the user-supplied allocation predictions to compensate. If, instead, it behaves like a container of pointers, then a line of code registering this fact is sufficient. If the behaviour does not fit into either of these categories, then the framework is flexible enough for clients to specify their own solutions.

The allocation testing framework has been rigorously tested with a combination of false-positive and false-negative checks and has very high fidelity. For example, if check_semantics is fed instances of an allocating type for which operator== has accidentally been written such that one (or both) of its arguments are captured by value, rather than by reference, the framework will detect this.

Generics-Friendliness

When writing generic code, it is natural to want to test it with a variety of appropriate types. This leads to the idea of templated unit tests; within sequoia this is most naturally achieved by leaving the test classes plain (rather than templated) but templating various methods inside. However, a challenge with doing this is interpreting failures. The line at which the failure occurs is no longer enough to uniquely disambiguate, since this line may correspond to several different template instantiations.

Consequently, sequoia reports failures with plenty of type information. For example, the following output is typical of what to expect when checking the equality of two instances of std::pair<int,double>:

Tests/TestFramework/UnitTestDiagnostics.cpp, Line 51
[std::pair<int, double>]
operator== returned false

Second element of pair is incorrect
[double]
operator== returned false
Obtained : 7.8
Predicted: -7.8

For each sub-check performed as part of the top-level check, type information is reported. This is generated by the class template type_demangler. Clients are welcome to specialize this to provide more readable demangling. Note that the built-in demangling strives to standardize the demangling from MSVC, clang and gcc. This ensures that when type information is written to files under version control, these files are stable under changes of platform.

There is another side to generics-friendliness. Consider the example of testing a class template. Following the examples above, it makes sense to start by testing (weak) equivalence following initialization and then subsequently performing equality checks of two instances following various operations. However, given this type is templated, it may be that the type with which it is instantiated does not naturally support equality checks. For example, consider std::pair<int, std::function<void ()>. Here, the first member supports equality checking but the second only weak equivalence checking. However, the implementation of value_tester for std::pair supplied by sequoia is such that it supports the following call:

check(with_best_available, "", x, y);

where x and y are both instances of std::pair<int, std::function<void ()>. Static reflection is used to invoke the strongest check for each of the wrapped type.

Expressiveness

The focus of the testing framework's API is on providing useful primitives, such as check_semantics, check_exception_thrown and check. As discussed above, the latter naturally forms overload sets which allows generic testing of class templates, even where the class is heterogeneous and the types with which it is instantiated may support checks of differing strengths. Clients are encouraged to supply each check with a description, which constitutes the second argument. In addition, there is the facility to provide customized advice in the event of particular failures. For example, within the allocation testing framework, it is possible for a negative number of allocations to be reported. This is rather counter-intuitive and, when encountered for the first time, may raise doubts as to the correctness of the framework. This is hopefully ameliorated by leveraging the advice functionality internally, giving typical output:

Obtained : -1
Predicted: 1
Advice   : A negative allocation count generally indicates an allocator propagating when it shouldn't or not propagating when it should.

Clients may provided customized advice by calling many of the check methods with an extra function object bound to an instance of the tutor class template. If operator() is overloaded in such a way that it takes two instances of a type and returns a string, then reflection is utilized to apply this function object whenever the appropriate overload may be invoked. For example, consider the case where a function object is supplied to provide advice for ints, under certain conditions. Suppose this function object is fed to a check performed on a std::vector<int>. In this case, the function object will be propagated internally until it reaches the point where it can be invoked on the ints at the end of the chain.

One frequent challenge with writing unit tests is their linear structure. Again, consider writing a container. Perhaps one of the first things to test is push_back. Next, it may be reasonable to check that erase returns us to where we started. Following this maybe invoke push_back twice. Should we test erase again? And if so back to one element or zero elements, or should we do both? Part of the issue is that the more is done, the more unwieldy the unit test becomes. To alleviate this, sequoia provides a graph-based testing solution defined in StateTransitionUtilities.hpp. The idea is that nodes corresponds to states of the type under test. Directed edges define transitions between nodes. When invoked, a breadth-first search is performed, invoking specified checks at each node and determining whether the transition from the previous state matches the expected state specified by the latest node. An example may be found in RegularStateTransitionDiagnostics.cpp.