Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

As a demonstration of unit testing in Python we will consider a class implementing stop list functionality. A stop list is a tool used in building search engines and other retrieval systems. In such systems, documents are usually indexed by the words they contain. A document on the organization of the U.S. government would probably contain the terms "President", "Senate", and "Supreme Court" among others. The document would be indexed by these words so that users who type some combination of them into a search form would see this document appear in their list of search results.

The purpose of a stop list is to increase the efficiency with which a system can process and store documents for indexing and queries for retrieval. They make it possible for the system to efficiently ignore words that do nothing to improve a users ability to find information and may, in some cases, hurt system performance. Stop lists encode words that make poor discriminators of content. For example, the word "where" might be used in the document on U.S. government described above. It will also be used in many other documents on topics ranging from very similar to entirely different. Words such as "where" are deemed "stop words." An information retrieval system is usually designed to ignore such words. Other obvious stop words are "a", "an", and "the."

To develop our stop list class we will use a test-driven development (TDD) application of unit testing. In TDD, before we implement a feature (e.g., a method within our stop list class), we first write a test case that, once the new feature is implemented, will ensure it meets specification. The motivating idea for TDD is that the developer should first reflect on a feature's specification and how it will be used in systems of which it is a part.

In Python, as is the case with most languages, unit testing is automated within a framework that takes as input, a test case and code on which to run the test case. As output, it generates a test report on the results of having executed the test case. The unit testing framework in the Python library is implemented in the unittest module. This tutorial will describe one approach to using unittest. For a broader perspective, see the official documentation on unittest for more information.

...