View Source

Test-driven development can be a powerful ally when it comes to improving software quality, but what happens when you try to adopt the TDD method when working with code that was not designed that way to begin with? Legacy code bases present special challenges. Here we will provide a quick-reference to techniques for getting legacy code under test.

The techniques on this page are all taken from Working Effectively with Legacy Code by Michael Feathers. All links on this page to the original text are provided via Safari Books Online.

Putting legacy code under test

One should not attempt to develop unit tests for an entire legacy code base before implementing any change. Unit tests should be developed incrementally as changes are required to the legacy code base, covering the areas of the system that will be impacted by the changes you are making. These tests are your safety net. They let you know that the changes you are making to the code base are not breaking any known working existing behavior.

The procedure for safely making changes to a legacy code base looks broadly like this:

Identify the points in the code that must be changed.
Identify the points to be tested to cover the changes. In object-oriented systems, which classes are to be changed, in procedural systems, which functions.
Break dependencies so that the code to be tested can be run inside your test harness.
Write the tests.
Make the changes and refactor the code base.

Dependencies

When attempting to implement tests for a legacy code base dependencies on other components will frequently prevent you from putting the code under test. The issues may come in a variety of flavors:

Dependency on external systems, data models, etc. are baked into the code we need to modify - the logic we need to test hasn't be sufficiently abstracted.
We can't physically instantiate a particular class in our testing harness:
- It tries to pull in external libraries and APIs that can't run in the testing harness.
- Construction of an instance of the class requires passing objects we can't create. (Feathers call this a case of an "irritating parameter"). Examples: DB Connection, network socket, etc.
- The code we need to test is tied directly to event handlers in GUI or other UI code that cannot be executed independent of user action.

Dependency is one of the most critical problems in software development. Much legacy code work involves breaking dependencies so that change can be easier.

The Legacy Code Dilemma

When we change code, we should have tests in place. To put tests in place, we often have to change code.

The two reasons for breaking dependencies

To separate the code we want to put under unit test from other dependencies that make it impossible/extremely difficult to run under the test harness. Example: A web application being dependent on the Java servlets API makes it difficult to instantiate the servlets in our test environment without a web container.
To sense the effects of our code on other components in the system. Using the same example of a web application, our tests are going to need to inject certain HTTP requests into the component under test and detect if certain responses are emitted. We might do this by creating a wrapper around our application that provides a simplified interface to substitute for HttpServletRequest and HttpServletResponse.

Fake objects - A way to substitute dependencies that can't be instantiated in the test environment for a facsimile that emulates the behavior of the dependency sufficient for the test. Fake objects also allow us to sense the effects of our code under test.

Fake objects can break the rules of good design. Use public properties and methods to make it easy to set and retrieve values from test code.
Fake objects are not as sophisticated as full-blown mock objects. Mocks provide a more complete simulation of the object being substituted and have the built in ability to set assertions for acceptable interactions from the test code. Mocking frameworks exist for most OO languages and can be quite useful, however simple fake objects will be acceptable in most situations.

Seams

When you break dependencies by using fake objects or other techniques, you will need a way to activate the appropriate behavior when the code is running under test vs. in the production environment. To do so, you must identify seams in the code.

A seam is a place where you can alter behavior in your program without editing in that place.

Every seam has an enabling point, a place where you can make the decision to use one behavior or another.

Types of seams:

Preprocessing seams - use the macro facility built into the language to substitute in fake implementations of dependencies while under test and the real implementations in production. (i.e. in C/C++, #ifdef TESTING, etc.)
Link seams - substituting in alternate implementations relying upon the linker. The alternate implementation must use an identical interface. Can be done at compile time, i.e. as part of the -l options passed to the compiler in the Makefile for C or C++, or at run time i.e. by setting the Java classpath variable.
Object seams - the most powerful and cleanest seam available in OO languages. Allows us to substitute in a new implementation by creating a test class that inherits from the same same expected base class or implements the same interface. not all method calls or seams. A seam requires an enabling point, so a case where we create an object instance and make a method call within a single method is not a seam. However if we pass in the object to be operated on as a parameter, then the argument list for the method can be an enabling point.

Catalog of specific issues

Feathers' text is organized in an FAQ format. The links below will take you to the appropriate chapter via Safari Books Online to assist you in overcoming the specific issue you are having putting your code under test:

More general legacy code base issues: