Seng 621 Winter 1999
|This web document, an extension of a presentation for S. Eng. 623, provides and introduction to software testing. This covers the basic methods of black and white box testing, as well as the different test levels (unit, integration, system, etc.) Each level is described with how it builds on the previous stage. A brief discussion of software testing metrics is presented. The challenges facing software testing in an organization are explored, and the question of testing versus software inspections is discussed. Finally we present a look at fault-based testing methods, a testing strategy which is gaining in popularity.|
Table of Contents
Introduction to Software Testing
|Software testing is an vital part of the
software lifecycle. To understand its role, it is
instructive to review the definition of software testing
in the literature.
Among alternative definitions of testing are the following:
"... the process of exercising or evaluating a system or system component by manual or automated means to verify that it satisfies specified requirements or to identify differences between expected and actual results ..."
(ANSI/IEEE Standard 729, 1983).
"... any activity aimed at evaluating an attribute or capability of a program or system and determining that it meets its required results. Testing is the measurement of software quality ..."
(Hetzel, W., The Complete Guide to Software Testing, QED Information Sciences Inc., 1984).
"... the process of executing a program with the intent of finding errors..."
(Myers, G. J., The Art of Software Testing, Wiley, 1979).
Of course, none of these definitions claims that testing shows that software is free from defects. Testing can show the presence, but not the absence of problems.
According to Humphrey , software testing is defined as 'the execution of a program to find its faults'. Thus, a successful test is one that finds a defect. This sounds simple enough, but there is much to consider when we want to do software testing. Besides finding faults, we may also be interested in testing performance, safety, fault-tolerance or security.
Testing often becomes a question of economics. For projects of a large size, more testing will usually reveal more bugs. The question then becomes when to stop testing, and what is an acceptable level of bugs. This is the question of 'good enough software'.
It is important to remember that testing assumes that requirements are already validated.
|White Box Testing||White box testing is performed to reveal problems with the internal structure of a program. This requires the tester to have detailed knowledge of the internal structure. A common goal of white-box testing is to ensure a test case exercises every path through a program. A fundamental strength that all white box testing strategies share is that the entire software implementation is taken into account during testing, which facilitates error detection even when the software specification is vague or incomplete. The effectiveness or thoroughness of white-box testing is commonly expressed in terms of test or code coverage metrics, which measure the fraction of code exercised by test cases.|
|Black Box Testing||Black box tests are performed to assess how well a
program meets its requirements, looking for missing or
incorrect functionality. Functional tests typically
exercise code with valid or nearly valid input for which
the expected output is known. This includes concepts such
as 'boundary values'.
Performance tests evaluate response time, memory usage, throughput, device utilization, and execution time. Stress tests push the system to or beyond its specified limits to evaluate its robustness and error handling capabilities. Reliability tests monitor system response to representative user input, counting failures over time to measure or certify reliability.
|Different Levels of Test||
.The following paragraphs describe the testing activities from the 'second half' of the software lifecycle.
|Unit Testing||Unit testing exercises a unit in isolation from the
rest of the system. A unit is typically a function or
small collection of functions (libraries, classes),
implemented by a single developer.
The main characteristic that distinguishes a unit is that it is small enough to test thoroughly, if not exhaustively. Developers are normally responsible for the testing of their own units and these are normally white box tests. The small size of units allows a high level of code coverage. It is also easier to locate and remove bugs at this level of testing.
|Integration Testing||One of the most difficult aspects of software
development is the integration and testing of large,
untested sub-systems. The integrated system frequently
fails in significant and mysterious ways, and it is
difficult to fix it
Integration testing exercises several units that have been combined to form a module, subsystem, or system. Integration testing focuses on the interfaces between units, to make sure the units work together. The nature of this phase is certainly 'white box', as we must have a certain knowledge of the units to recognize if we have been successful in fusing them together in the module.
There are three main approaches to integration testing: top-down, bottom-up and 'big bang'. Top-down combines, tests, and debugs top-level routines that become the test 'harness' or 'scaffolding' for lower-level units. Bottom-up combines and tests low-level units into progressively larger modules and subsystems. 'Big bang' testing is, unfortunately, the prevalent integration test 'method'. This is waiting for all the module units to be complete before trying them out together.
Integration tests can rely heavily on stubs or drivers. Stubs stand-in for finished subroutines or sub-systems. A stub might consist of a function header with no body, or it may read and return test data from a file, return hard-coded values, or obtain data from the tester. Stub creation can be a time consuming piece of testing.
The cost of drivers and stubs in the top-down and bottom-up testing methods is what drives the use of 'big bang' testing. This approach waits for all the modules to be constructed and tested independently, and when they are finished, they are integrated all at once. While this approach is very quick, it frequently reveals more defects than the other methods. These errors have to be fixed and as we have seen, errors that are found 'later' take longer to fix. In addition, like bottom up, there is really nothing that can be demonstrated until later in the process.
|External Function Testing||The 'external function test' is a black box test to verify the system correctly implements specified functions. This phase is sometimes known as an alpha test. Testers will run tests that they believe reflect the end use of the system.|
|System Testing||The 'system test' is a more robust version of the
external test, and can be known as an alpha test. The
essential difference between 'system' and 'external
function' testing is the test platform. In system
testing, the platform must be as close to production use
in the customers environment, including factors
such as hardware setup and database size and complexity.
By replicating the target environment, we can more
accurately test 'softer' system features (performance,
security and fault-tolerance).
Because of the similarities between the test suites in the external function and system test phases, a project may leave one of them out. It may be too expensive to replicate the user environment for the system test, or we may not have enough time to run both.
|Acceptance Testing||An acceptance (or beta) test is an exercise of a completed system by a group of end users to determine whether the system is ready for deployment. Here the system will receive more realistic testing that in the 'system test' phase, as the users have a better idea how the system will be used than the system testers.|
|Regression Testing||Regression testing is an expensive but necessary
activity performed on modified software to provide
confidence that changes are correct and do not adversely
affect other system components. Four things can happen
when a developer attempts to fix a bug. Three of these
things are bad, and one is good:
Because of the high probability that one of the bad outcomes will result from a change to the system, it is necessary to do regression testing.
It can be difficult to determine how much re-testing is needed, especially near the end of the development cycle. Most industrial testing is done via test suites; automated sets of procedures designed to exercise all parts of a program and to show defects. While the original suite could be used to test the modified software, this might be very time-consuming. A regression test selection technique chooses, from an existing test set, the tests that are deemed necessary to validate modified software.
There are three main groups of test selection approaches in use:
An interesting approach to limiting test cases is based on whether we can confine testing to the "vicinity" of the change. (Ex. If I put a new radio in my car, do I have to do a complete road test to make sure the change was successful?) A new breed of regression test theory tries to identify, through program flows or reverse engineering, where boundaries can be placed around modules and subsystems. These graphs can determine which tests from the existing suite may exhibit changed behavior on the new version.
Regression testing has been receiving more attention as corporations focus on fixing the 'Year 2000 Bug'. The goal of most Y2K is to correct the date handling portions of their system without changing any other behavior. A new 'Y2K' version of the system is compared against a baseline original system. With the obvious exception of date formats, the performance of the two versions should be identical. This means not only do they do the same things correctly, they also do the same things incorrectly. A non-Y2K bug in the original software should not have been fixed by the Y2K work.
A frequently asked question about regression testing is 'The developer says this problem is fixed. Why do I need to re-test? to which the answer is 'The same person probably told you it worked in the first place'.
|Installation Testing||The testing of full, partial, or upgrade install/uninstall processes.|
|Completion Criteria||There are a number of different ways to determine the
test phase of the software life cycle is complete. Some
common examples are:
When we begin to talk about completion criteria, we move naturally into a discussion of software testing metrics.
|Goals||As stated above, the major goal of
testing is to discover errors in the software. A
secondary goal is to build confidence that the system
will work without error when testing does not reveal any
errors. Then what does it mean when testing does not
detect any errors? We can say that either the software is
high quality or the testing process is low quality. We
need metrics on our testing process if we are to tell
which is the right answer.
As with all domains of the software process, there are hosts of metrics that can be used in testing. Rather than discuss the merits of specific measurements, it is more important to know what they are trying to achieve.
Three themes prevail:
|Quality Assessment||An important question in the testing process is
"when should we stop?" The answer is when
system reliability is acceptable or when the gain in
reliability cannot compensate for the testing cost. To
answer either of these concerns we need a measurement of
the quality of the system.
The most commonly used means of measuring system quality is defect density. Defect density is represented by:
where system size is usually expressed in thousands of lines of code or KLOC. Although it is a useful indicator of quality when used consistently within an organization, there are a number of well documented problems with this metric. The most popular relate to inconsistent definitions of defects and system sizes.
Defect density accounts only for defects that are found in-house or over a given amount of operational field use. Other metrics attempt to estimate of how many defects remain undetected. A simplistic case of error estimation is based on "error seeding". We assume the system has X errors. It is artificially seeded with S additional errors. After a testing, we have discovered Tr 'real' errors and Ts seeded errors. If we assume (questionable assumption) that the testers find the same percentage of seeded errors as real errors, we can calculate X:
For example, if we find half the seeded errors, then the number of 'real' defects found represents half of the total defects in the system.
Estimating the number and severity of undetected defects allows informed decisions on whether the quality is acceptable or additional testing is cost-effective. It is very important to consider maintenance costs and redevelopment efforts when deciding on value of additional testing.
|Risk Management||Metrics involved in risk management measure how
important a particular defect is (or could be). These
measurements allow us to prioritize our testing and
repair cycles. A truism is that there is never enough
time or resources for complete testing, making
prioritization a necessity.
One approach is known as Risk Driven Testing, where Risk has specific meaning. The failure of each component is rated by Impact and Likelihood. Impact is a severity rating, based on what would happen if the component malfunctioned. Likelihood is an estimate of how probable it is that the component would fail. Together, Impact and Likelihood determine the Risk for the piece.
Obviously, the higher rating on each scale corresponds to the overall risk involved with defects in the component. With a rating scale, this might be represented visually:
The relative importance of likelihood and impact will vary from project to project and company to company.
A system level measurement for risk management is the Mean Time To Failure (MTTF). Test data sampled from realistic beta testing is used find the average time until system failure. This data is extrapolated to predict overall uptime and the expected time the system will be operational. Sometimes measured with MTTF is Mean Time To Repair (MTTR). This represents the expected time until the system will be repaired and back in use after a failure is observed. Availability, obtained by calculating MTTF / (MTTF + MTTR), is the probability that a system is available when needed. While these are reasonable measures for assessing quality, they are more often used to assess the risk (financial or otherwise) that a failure poses to a customer or in turn to the system supplier.
|Process Improvement||It is generally accepted that achieve improvement you
need a measure against which to gauge performance. To
improve our testing processes we the ability to compare
the results from one process to another.
Popular measures of the testing process report:
It is also important to consider reported system failures in the field by the customer. If a high percentage of customer reported defects were not revealed in-house, it is a significant indicator that the testing process in incomplete.
A good defect reporting structure will allow defect types and origins to be identified. We can use this information to improve the testing process by altering and adding test activities to improve our changes of finding the defects that are currently escaping detection. By tracking our test efficiency and effectiveness, we can evaluate the changes made to the testing process.
Testing metrics give us an idea how reliable our testing process has been at finding defects, and can is a reasonable indicator if its performance in the future. It must be remembered that measurement is not the goal, improvement through measurement, analysis and feedback is what is needed.
|Test Groups||The following summarizes the Pros and
Cons of maintaining separate test groups
The key to optimizing the use of separate test groups is understanding that developers are able to find certain types of bugs very efficiently, and testers have greater abilities in detecting other bugs. An important consideration would be the size of the organization, and the criticality of the product.
|Testing Problems||When trying to effectively implement software
testing, there are several mistakes that organizations
typically make. The errors fall into (at least) 4 broad
Misunderstanding the role of testing.
The purpose of testing is to discover defects in the product. Furthermore, it is important to have an understanding of the relative criticality of defects when planning tests, reporting status, and recommending actions.
Poor planning of the testing effort.
Test plans often over emphasize testing functionality at the expense of potential interactions. This mentality also can lead to incomplete configuration testing and inadequate load and stress testing. Neglecting to test documentation and/or installation procedures is also a risky decision.
Using the wrong personnel as testers.
The role of testing should not be relegated to junior programmers, nor should it be a place to employ failed programmers. A test group should include domain experts, and need not be limited to people who can program. A test team that lacks diversity will not be as effective.
Poor testing methodology.
Just as programmers often prefer coding to design, testers can be too focussed on running tests at the expense of designing them. The tests must verify that product does what it is supposed to, while not doing what it should not. As well, using code coverage as a performance goal for testers, or ignoring coverage entirely are poor strategies.
|Inspections are undoubtedly a critical
tool to detect and prevent defects. Inspections are
strict and close examinations conducted on
specifications, design, code, test, and other artifacts.
An important point about inspections is that they can be
performed much earlier in the design cycle, well before
testing begins. Having said that, testing is something
that can be started much earlier than is normally the
case. Testers can review their test plans with developers
as they are creating their designs. Thus the developer
may be more aware of the potential defects and act
accordingly. In any case, the detection of defects early
is critical, the closer to the time of its creation that
we detect and remove a defect, the lower the cost, both
in terms of time and money. This is illustrated in figure
Evidence of the benefits of inspections abounds. The literature (Humphrey 1989) reports cases where:
In the face of all this evidence, it has been suggested that "software inspections can replace testing". While the benefits of inspections are real, they are not enough to replace testing. Inspections could replace testing if and only if all information gleaned through testing could be obtained through inspection. This is not true for several reasons. Firstly, testing can identify defects due to complex interactions in large systems (e.g. timing/synchronization). While inspections can detect this event, as systems become more complex the chances of one person understanding all the interfaces and being present at all the reviews is quite small.
Second, testing can provide a measure of software reliability (i.e. failures/execution time) that is unobtainable from inspections. This measure can often be used as a vital input to the release decision. Thirdly, testing identifies system level performance and usability issues that inspections cannot. Therefore, since inspections and testing provide different, equally important information, one cannot replace the other. However, depending on the product, the optimal mix of inspections and testing may be different!
|The following paragraphs will describe
some newer techniques in the software testing field.
Fault based methods include Error Based Testing, Fault
seeding, mutation testing, and fault injection, among
After briefly describing each of the 4 techniques, fault injection will be discussed in more detail.
These methods attempt to address the belief that current techniques for assessing software quality are not adequate, particularly in the case of mission critical systems. Voas et. al. suggests that the traditional belief that improving and documenting the software development process will increase software quality is lacking. Yet, they recognize that the amount of testing (which is product focussed) required in order to demonstrate high reliability is impractical. In short, quality processes cannot demonstrate reliability and the testing necessary to do so is impossible to perform.
Fault injection is not a new concept. Hardware design techniques have long used inserted fault conditions to test system behavior. It is as simple as pulling the modem out of your PC during use and observing the results to determine if they are safe and/or desired. The injection of faults into software is not so widespread, though it would appear that companies such as Hughes Information Systems, Microsoft, and Hughes Electronics have applied the techniques or are considering them. Properly used, fault insertion can give insight as to where testing should be concentrated, how much testing should be done, whether or not systems are fail-safe, etc.
As a simple example consider the following code:
In this case it is catastrophic if T > 100. By using perturb(x) to generate changed values of X (i.e. a random number generator) you can quickly determine how often corrupted values of X lead to undesired values of T.
The technique can be applied to internal source code, as well as to 3rd party software, which may be a "black box"
|Software testing is an important part of
the software development process. It is not a single
activity that takes place after code implementation, but
is part of each stage of the lifecycle. A successful test
strategy will begin with consideration during
requirements specification. Testing details will be
fleshed through high and low level system designs, and
testing will be carried out by developers and separate
test groups after code implementation.
As with the other activities in the software lifecycle, testing has its own unique challenges. As software systems become more and more complex, the importance of effective, well planned testing efforts will only increase.
Marick, Brian, "Classic Testing
"Software Testing Techniques"
Hower, Rick, "Software QA and
Testing Frequently-Asked-Qustions, Part 1", 1998
|Software Testing Seng 621 Winter 1999||Simon Dyck