Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

As a demonstration of unit testing in Python we will consider a class implementing stop list functionality. A stop list is a tool used in building search engines and other retrieval systems. In such systems, documents are usually indexed by the words they contain. A document on the organization of the U.S. government would probably contain the terms "President", "Senate", and "Supreme Court" among others. The document would be indexed by these words so that users who type some combination of them into a search form would see this document appear in their list of search results.

The purpose of a stop list is to increase the efficiency with which a system can process and store documents for indexing and queries for retrieval. They make it possible for the system to efficiently ignore words that do nothing to improve a users ability to find information and may, in some cases, hurt system performance. Stop lists encode words that make poor discriminators of content. For example, the word "where" might be used in the document on U.S. government described above. It will also be used in many other documents on topics ranging from very similar to entirely different. Words such as "where" are deemed "stop words." An information retrieval system is usually designed to ignore such words. Other obvious stop words are "a", "an", and "the."

To develop our stop list class we will use a test-driven development (TDD) application of unit testing. In TDD, before we implement a feature (e.g., a method within our stop list class), we first write a test case that, once the new feature is implemented, will ensure it meets specification. The motivating idea for TDD is that the developer should first reflect on a feature's specification and how it will be used in systems of which it is a part.

In Python, as is the case with most languages, unit testing is automated within a framework that takes as input, a test case and code on which to run the test case. As output, it generates a test report on the results of having executed the test case. The unit testing framework in the Python library is implemented in the unittest module. This tutorial will describe one approach to using unittest. For a broader perspective, see the official documentation on unittest for more information.

A test case is a class containing methods for testing each requirement of a feature to ensure all aspects of the specification are implemented and function correctly. We implement test cases as subclasses of unittest.TestCase. Our first step will be to implement an empty test case and run it to be sure we have the basics in place. Following is a code listing for our initial test case in a file called term_test.py.

Code Block
titleterm_test.py


import unittest
	
class StopListTestCase(unittest.TestCase):
    # empty test case