You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

As a demonstration of unit testing in Python we will consider a class
implementing stop list functionality. A stop list is a tool used in
building search engines and other retrieval systems. In such systems,
documents are usually indexed by the words they contain. A document on
the organization of the U.S. government would probably contain the terms
"President", "Senate", and "Supreme Court" among others. The document
would be indexed by these words so that users who type some
combination of them into a search form would see this document appear
in their list of search results.

The purpose of a stop list is to increase the efficiency with which a
system can process and store documents for indexing and queries for
retrieval. They make it possible for the system to efficiently ignore
words that do nothing to improve a users ability to find information
and may, in some cases, hurt system performance. Stop lists encode
words that make poor discriminators of content. For example, the word
"where" might be used in the document on U.S. government described
above. It will also be used in many other documents on topics ranging
from very similar to entirely different. Words such as "where" are
deemed "stop words." An information retrieval system is usually
designed to ignore such words. Other obvious stop words are "a", "an",
and "the."

To develop our stop list class we will use a [test-driven
development|http://en.wikipedia.org/wiki/Test-driven_development]
(TDD) application of unit testing. In TDD, before we implement a
feature (e.g., a method within our stop list class), we first write a test
case that, once the new feature is implemented, will ensure it meets
specification. The motivating idea for TDD is that the developer
should first reflect on a feature's specification and how it will be
used in systems of which it is a part.

In Python, as is the case with most languages, unit testing is
automated within a framework that takes as input, a test case and
code on which to run the test case. As output, it generates a test
report on the results of having executed the test case. The unit
testing framework in the Python library is implemented in the unittest
module. This tutorial will describe one approach to using
unittest. For a broader perspective, see the [official documentation on
unittest|http://docs.python.org/library/unittest.html] for more information.

A test case is a class containing methods for testing each requirement
of a feature to ensure all aspects of the specification are
implemented and function correctly. We implement test cases as
subclasses of unittest.TestCase. Our first step will be to implement
an empty test case and run it to be sure we have the basics in
place. Following is a code listing for our initial test case in a file
called term_test.py.

term_test.py
import unittest
	
class StopListTestCase(unittest.TestCase):
    # empty test case    

  • No labels