Monday, October 29, 2012

Documenting smells for untestable .NET code with NSubstitute and NUnit

Retrofitting unit tests is hard work and the knowledge about the code could easily be forgotten until it will be refactored. We want to communicate to the developers (and to ourselves after enough time has passed to forget what we've learned about the code) our testing intentions, how the code could be more testable.

One of the unit testing axioms is: always keep production and test code separate. Now if you program in .NET/C# and create a separate test project, then friend assemblies or this pattern can help you to overcome the visibility problem: if I can only access the public interface of my code, there will be much unaccessible code that I still need to test.

But if you work with legacy code there will likely access much untestable code. Tight coupling, hidden dependencies, implementation dependency and many many others are common problems you would normally conquer by refactoring your code (see e.g. Michael C. Feather´s Dependency Breaking Techniques (Part III of his book)).

Sometimes though, you won't have the permission to change the production code. Sometimes you just need to document the code for future reference and prepare test cases so the knowledge you've built up won't get wasted until the code has been refactored..

In this world of retrofitting tests to legacy code you will often find it useful to document the smells or pathologies that make your testing so hard resp. impossible. Instead of writing bug reports, source comments, etc. you might consider next time the following pattern.

For example, imagine you have the following code you're not allowed to make changes to:

public class HasDependencyToImplementation{
  
  public HasDependencyToImplementation(Class c)
  {
    this.c = c; 
    ...
  }

  public void DoSomethingWithC(){
    this.privateProperty = this.c.DoSomething();
    this.PublicProperty = this.WorkWithPrivateProperty();
  }
  
  private T WorkWithPrivateProperty(){
    ... do something with this.privateProperty ...
  }

}

public class Class{...}

Now, if you need to take control over the method Class.DoSomething, you'll normally be advised to extract the interface of Class. Then the constructor admits passing it a mocked Class instance. But what if you're not allowed to do that? Do you want to call the developer to please refactor that code and check in again? Is your developer even in the mood or has time to listen to your pleas? Let's guess they won't. Then with NSubstitute and NUnit you would normally still be able to write a (failing) test like this:

...
[Test]
public void DoSomethingOnC_ChangesState(){
   var c = Substitute.For<Class>();
   var controllableValue = getValue();
   c.DoSomething().Returns(controllableValue);
   
   var hasDependency = new HasDependencyToImplementation(c);
   hasDependency.DoSomethingWithC();
   var expected = getExpectedFor(controllableValue);

   Assert.That(
       hasDependency.PublicProperty,
       Is.EqualTo(
       expected);
}

This test would normally compile, but by running it, we will get an exception from NSubstitute as Class is not abstract and the method Class.DoSomething isn't virtual, so NSubstitute cannot overwrite it. Our problem is: on one hand we want this test to be written for documentation and future testing, after refactoring is done, on the other hand the refactoring developers won't necessarily be able to interpret the test result as a todo for them and you might forget what the problem was until next time you'll work with the code. A solution to this problem might be the following:

We implement a custom assert on NUnit.Framework.Assert:

public static void HasSmell(int PRIO, string PROBLEM, string PROPOSAL){
        Assert.Fail(
                 helper.GetMessage(PRIO, PROBLEM, PROPOSAL));
}

and implement constants

public static Smells{
   public const string IrritatingParameter = "Irritating parameter";
   ...
}

public static Refactorings{
   public const string ExtractInterface = "Extract interface";
   ...
}

Then we can add documentation for developers:


...
[Test]
public void DoSomethingOnC_ChangesState(){

   Assert.HasSmell(
          PRIO: HIGH,
          PROBLEM: Smells.IrritatingParameter
                  +"\nClass c: cannot be mocked for checking.",
          PROPOSAL: Refactorings.ExtractInterface);

   var c = Substitute.For<Class>();
   var controllableValue = getValue();
   ...
}

Which not only will fail when run by developers, but the NUnit runner will tell them

Test DoSomethiongOnC_ChangesState failed:
SMELL PRIO HIGH
PROBLEM: Irritating parameter
Class c: cannot be mocked for checking.
PROPOSAL: Extract interface

Monday, July 9, 2012

History Based Heuristics for Regression Testing

Common Testing knowledge - see for example this article - recommends you to:
  1. Identify test areas of your product by
  2. Select and add test cases based on the identified areas
  3. Add some basic test cases (some kind of smoke test)
This really is a good way to get the job done, though in a legacy environment where you would expect the testing process to have much less lifetime than your product under test, you might want some more testing to be done.



Imagine Regression Testing as part of a bigger test iteration planned for your next product release. As by 1. through 3. you only can check if recently - since the last release -  modified items continue to work pleasantly, you still encounter yourself with that blind side of all those things that haven't been tested ever.

How to approach? You might want to organise a bug bash. Or you might consider to expand 3. to more test areas. But which would those be? If you work in a legacy environment you probably at least have a bug tracking system (BTS) at your disposal that has lived since the beginning of any of your company's development process (or at least longer than your testing documentation).

The idea is quite simple and is an analogue of the one to identify test items for test driven maintenance.

First we identify our key numbers we can normally obtain by asking the underlying data base of our BTS:

Key Name Description
D Number of Duplicate Bug Reports Duplicates come up if several users have found an issue in different contextes. The more duplicates a bug report has the more important it is to us to identify the underlying area.
R Number of Related Bug Reports The more related bug reports are known the bigger the test area must be and therefore the more important the bug report itself.
C Number of Comments Many comments indicate a matter worth a discussion. This number also includes status changes (like New => Confirmed).
S Summary length Although a long, unconcise bug summary may be symptom of a bad style, we want to assume that complexer bugs need a longer summary.
B Description Length Same as in case of S: we want to assume that more complicated issues need a longer description / report.
P Priority (mapped to a numeric value) There might be different priorities (internal, external) but after all there's a reason why someone decided to assign such priority.
T Creation date If the first bug report ever has status fixed and after n years the bug hasn't come up again, we assume the feature to be stable. Also, if we tested a feature in our last internal release test ("recent date"), we might not need to re-test it again.

From all those numbers we can derive values indicating the relevance for our regression test.
Assuming linear relations we might calculate then numbers for each bug report {n_D,...,n_T} in [0,1] like this:

  • n_D = D/(maximal number of duplicates of any bug report)
  • n_R = R/(maximal number of related bug reports of any bug report)
  • n_C = C/(maximal number of comments of any bug report)
  • n_S = S/(maximal number of characters in trimmed string of the summary of any bug report)
  • n_B = B/maximal number of characters in trimmed string of the description of any bug report)
  • n_P = P/(maximal possible priority)
  • n_T = ticks(T)/ticks(recent date) or 0, if the report was created after the set recent date.
We don't have much experience about which of those numbers might be the most important and we don't want to make a scientific study of it, so we set weights {w_C,...,w_T} being numbers in [0,1] such that w_C + ... + w_T = 1. These weights allow us to play with how much importance we attribute to each key number. For example, if we only wanted to evaluate our bug reports based on the creation date we would set w_C = ... = w_P = 0, w_T = 1 meaning that we don't care about the other key numbers.

For any selection of these weights we then calculate for each bug report its relevance as the weighted mean

K(bug report) = (w_D * n_D + ... + w_T * n_T)


K is now a good mean to identify bug reports which matter most to us regarding our intuitions and how much we think we should take each factor into account. What do you think?