Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

As a demonstration of unit testing, we will consider a class implementing stop list functionality. A stop list is a tool used in building search engines and other retrieval systems. In such systems, documents are usually indexed by the words they contain. A document on the organization of the U.S. government would probably contain the terms "President", "Senate", and "Supreme Court" among others. The document would be indexed by these words so that users who type some combination of them into a search form would see this document appear in their list of search results.

The purpose of a stop list is to increase the efficiency with which a system can process and store documents for indexing and queries for retrieval. They make it possible for the system to efficiently ignore words that do nothing to improve a users ability to find information and may, in some cases, hurt system performance. Stop lists encode words that make poor discriminators of content. For example, the word "where" might be used in the document on U.S. government described above. It will also be used in many other documents on topics ranging from very similar to entirely different. Words such as "where" are deemed "stop words." An information retrieval system is usually designed to ignore such words. Other obvious stop words are "a", "an", and "the."

To develop our stop list class we will use a test-driven development (TDD) application of unit testing. In TDD, before we implement a feature (e.g., a method within our stop list class), we first write a test case that, once the new feature is implemented, will ensure it meets specification. The motivating idea for TDD is that the developer should first reflect on a feature's specification and how it will be used in systems of which it is a part.