This year's winner for the best presentation at QA&Test 2011 in Bilbao was Julian Harty and I am totally convinced that it was the right choice. Julian did not only present in a very clear style, combining theory with practical examples, but he presented a totally new view to accessability testing (@testinggeek.com).
Does your company care about accessability issues of their products?
We don't! And I don't know of anyone else. It's expensive and normally not worth the effort. Public institutions, of course, need to care about this, but private companies? Why should they care?
Julian's brilliant idea consists in using accessability technology to improve testability of our products! (see his slides, 18)
Brilliant of course because he found another use for accessability which is testability (and search engine optimisation see his slides, 18). Everybody is happy and the product gets improved in various manners: More SEO for the management, more testability for testers, better usability and accessability for the users.
In Germany we have a saying for this: it's like a kinder surprise!
(because it contains - as advertised - three things at the same time: amazement, chocolate AND a toy :-) )
Saturday, October 29, 2011
Monday, October 10, 2011
Effort estimates - be brave, invent your own
Inspired by Michael Feathers' Working Effectively With Legacy Code (on google books) I felt brave enough to confront my latest "homework" in testing: "Write a detailed test plan about how to test that code!" The technical term would be "retrofitting tests for oo code" I understand.
Now, unfortunately I was rather restrained because some premises were put up that made many of Feathers' advices impossible to follow:
1.) Tests will be seperated from the developers' code, into their own projects.
2.) Only public members are relevant.
3.) You cannot break dependencies.
4.) You need to mainly write characterisation tests.
5.) My boss needs an idea about cost estimation.
So how did I start? Well, first I used that good old tranquilizer: counting. So I wrote a program using System.Reflection (.NET) to list and count the following (per project):
1.) Non-abstract classes (without enums) t (note that in System.Reflection static classes are typed sealed and abstract)
2.) Methods (and constructors) m
3.) Properties and fields (without constants) p
I then loosely mimicked a formula I found on the internet (mathematik.uni-ulm.de) for calculating the effort on the following basis:
1.) A first impression cannot count each testing state (of input parameters/the testing object), neither too much dependencies without digging too deep.
2.) There's a stable average of testing states and complexity.
3.) Objects will be used as parameters, too.
4.) My boss wants a simple answer, not a maths thesis on the subject, things need to get done.
After much trying demanding attention of a very good and patient friend I have (thank you Ewald!) I came back to the formula I started from:
testing_effort = p*m*t*TEU
(where p,m,t as above and TEU refers to "testing effort unit" which would need to be interpolated)
When I applied that formula to a recent unit testing project to interpolate TEU and after that used it to calculate a first estimation, I could be sure this wasn't the right way, as my results foresaw some gigantic ressources we just don't dispose of.
So, next step was some static code analysis targeting the question how to reduce test basis. (Remember: "We cannot test everything.")
Some sensible criteria I set up with one of our developers then were e.g.
1.) If there are two overloads, look if they really do something different, kick one out (of your calculation).
2.) If there are e.g. 20 read-only properties of type string, count 1 property of type List<string>.
3.) If two classes inherit from the same abstract class, only count where overriden / where they're different, a.s.o.
The good thing about reading the source code is that you get quite a good picture of the importance of certain classes so you can prioritize the components / units at the same time.
So then I had a final list, different countings, my formula would certainly give better results:
total_testing_effort_1 = \sum_{i=1,...,7} te_i
(where te_i is the testing effort of each assembly, te_i = p_i*m_i*t_i)
The ridiculous result was: 37h. Thirty-seven hours for seven quite complicated assemblies (e.g. there was one object that could only be created by another which itself depended on a third one).
Still I didn't panic - it's just numbers!
So, I thought: to include the dependency my assemblies have on each other, I can instead calculate the testing effort of the total numbers, thus, by mutliplication, establishing a relation between those numbers that is stronger, that is
total_testing_effort_2 = (\sum t_i)*(\sum p_i)*(\sum m_i)
resulting in another ridiculous estimate of 1039 hours.
Knowing that plan had to be written, and that I couldn't live with 37h and my boss couldn't do so with 1039h I had to find some equally acceptable number. So find some mean. It's obvious that the average isn't good enough, so I used the geometric mean of those two:
total_testing_effort = sqrt(total_testing_effort_1*total_testing_effort_2)
And there it was: a magical 119,1h approx. 25 days of 8h work.
You think this is all hocus pocus? Maybe you're right. But I've found the following to be true, too:
1.) There is an effort estimate we feel good about.
2.) My boss understands the code isn't as easy to be tested as he had thought, and will probably be more forgiving when not "everything" will have been tested.
Of course, I still hope my baby will show some good approximation over time :-)
Do you have other means of estimating costs / setting up test plans for legacy code? Found mistakes or nonsense? Please comment!
Now, unfortunately I was rather restrained because some premises were put up that made many of Feathers' advices impossible to follow:
1.) Tests will be seperated from the developers' code, into their own projects.
2.) Only public members are relevant.
3.) You cannot break dependencies.
4.) You need to mainly write characterisation tests.
5.) My boss needs an idea about cost estimation.
So how did I start? Well, first I used that good old tranquilizer: counting. So I wrote a program using System.Reflection (.NET) to list and count the following (per project):
1.) Non-abstract classes (without enums) t (note that in System.Reflection static classes are typed sealed and abstract)
2.) Methods (and constructors) m
3.) Properties and fields (without constants) p
I then loosely mimicked a formula I found on the internet (mathematik.uni-ulm.de) for calculating the effort on the following basis:
1.) A first impression cannot count each testing state (of input parameters/the testing object), neither too much dependencies without digging too deep.
2.) There's a stable average of testing states and complexity.
3.) Objects will be used as parameters, too.
4.) My boss wants a simple answer, not a maths thesis on the subject, things need to get done.
After much trying demanding attention of a very good and patient friend I have (thank you Ewald!) I came back to the formula I started from:
testing_effort = p*m*t*TEU
(where p,m,t as above and TEU refers to "testing effort unit" which would need to be interpolated)
When I applied that formula to a recent unit testing project to interpolate TEU and after that used it to calculate a first estimation, I could be sure this wasn't the right way, as my results foresaw some gigantic ressources we just don't dispose of.
So, next step was some static code analysis targeting the question how to reduce test basis. (Remember: "We cannot test everything.")
Some sensible criteria I set up with one of our developers then were e.g.
1.) If there are two overloads, look if they really do something different, kick one out (of your calculation).
2.) If there are e.g. 20 read-only properties of type string, count 1 property of type List<string>.
3.) If two classes inherit from the same abstract class, only count where overriden / where they're different, a.s.o.
The good thing about reading the source code is that you get quite a good picture of the importance of certain classes so you can prioritize the components / units at the same time.
So then I had a final list, different countings, my formula would certainly give better results:
total_testing_effort_1 = \sum_{i=1,...,7} te_i
(where te_i is the testing effort of each assembly, te_i = p_i*m_i*t_i)
The ridiculous result was: 37h. Thirty-seven hours for seven quite complicated assemblies (e.g. there was one object that could only be created by another which itself depended on a third one).
Still I didn't panic - it's just numbers!
So, I thought: to include the dependency my assemblies have on each other, I can instead calculate the testing effort of the total numbers, thus, by mutliplication, establishing a relation between those numbers that is stronger, that is
total_testing_effort_2 = (\sum t_i)*(\sum p_i)*(\sum m_i)
resulting in another ridiculous estimate of 1039 hours.
Knowing that plan had to be written, and that I couldn't live with 37h and my boss couldn't do so with 1039h I had to find some equally acceptable number. So find some mean. It's obvious that the average isn't good enough, so I used the geometric mean of those two:
total_testing_effort = sqrt(total_testing_effort_1*total_testing_effort_2)
And there it was: a magical 119,1h approx. 25 days of 8h work.
You think this is all hocus pocus? Maybe you're right. But I've found the following to be true, too:
1.) There is an effort estimate we feel good about.
2.) My boss understands the code isn't as easy to be tested as he had thought, and will probably be more forgiving when not "everything" will have been tested.
Of course, I still hope my baby will show some good approximation over time :-)
Do you have other means of estimating costs / setting up test plans for legacy code? Found mistakes or nonsense? Please comment!
Wednesday, August 31, 2011
Pass-fail-criteria - testing a hash function
I recently had to test a hash function. Let me explain:
Hash function
A hash function mathematically is a mapping h : D -> V, where each datum in D is assigned a value in V, normally being V a subset of the natural numbers {0,1,2,...} or a finite sum of subsets of the natural numbers.
The purpose is to find an easy way to identify data of arbitrary type. It can be thought of as a fingerprint of my piece of data. E.g. I want to save images in a table, but I don't want my data base to blow up so I only allow for one copy of an image to be saved. How do I recognise if I already have a copy in my table?
If d is my image and v=h(d) then I can use v as key in my table. If then I want to save another image d' in my table and found that v'=h(d') already is in my table, I'd know I already saved it and shouldn't need to upload it again.
But then hash functions aren't normally (or virtually cannot be) injective. This is the mathematical term for uniqueness of my fingerprints. This would be the case if each piece of data I hash would have a different unique fingerprint.
If two pieces of data have the same fingerprint this is called a collision. Surely, if we can't avoid collisions totally, we'd at least want them to occur almost never.
Test a hash function
Almost never is clearly a relative term. If I have 10 pieces of data, we cannot accept that two have the same fingerprint, saying 100% confidence. You could also feel that you can accept 2 of 100000 pieces of data to have the same hashvalue; this would be 1 - 2/100000 = 99.998% confidence.
I didn't have any specifications telling me what confidence or quality my hash function had to fulfill. The function has the form:
hash(str string, range int): hashkey int, i.e. each string (of arbitrary length) is encoded into a natural number in {0,1,...,range}.
So I reasoned:
- If I have a confidence of 99.99% I personally am fine with my test. As long as
- there's a way to improve confidence by tuning the input of my hash function.
Admittedly 99.99% isn't too much. See discussion below.
So I took 50000 arbitrary distinct strings (randomly generated), hashed them and than looked at how many different hashkeys were produced. Mathematically spoken I calculated:
test_confidence_1 = #h({set of 50000 distinct strings}, 50000) / 50000, where # means 'number of'.
I had a pretty bad result (67% confidence) which was what I could have expected choosing 50000 arbitrary strings but there was a way to improve test confidence:
test_confidence_2 = #h({set of 50000 distinct strings}, 50000*1000) / 50000.
This one gave me 99.995% confidence so I filed a "bug report" about the results. Hopefully developeres will read it and improve the hash function or document the confidence for future users of this hash function.
I learned what a hash function is, how I could test it, and finally:
- As a tester sometimes you will have to setup your own pass-fail-criteria.
Management and Development later still can and hopefully will discuss your results / criteria with you. That's okay. But you cannot do your work waiting indefinitely for some specification. You can use your own criteria to provoke later discussion also; this is why I chose my test confidence just high enough to make the test fail.
Friday, August 12, 2011
Overflow without warning
Testing might also be kind of learning programming. E.g. there's this programming resp. transformation language for structured (text) data, MetaMorphosis.
As I'm trying to figure out how that scripting language works I found myself with little documentation about datatypes. As a mathematician with interest in prime number testing I'm always interested in big integers as in java or C#. So I wrote the following hoping the loop NOT to end with an overflow:
*count := 1, *lastcount:=*count,
!while *count >= *lastcount !do
(*count).echo,
*lastcount := *count,
*count := 10*(*count)
!end
The loop really didn't end by itself throwing any error. But what I saw on my screen kind of surprised me. When I paused it I found that
1946039808 <= -1348927488
seemed to be true. So I asked in pace mode (option -p):
> i *result := (-1348927488) - 1946039808;
> i *result;
> 1000000000
So supposing a < b will be true if and only if 0 < b - a we see that really 1946039808 <= -1348927488.
What have I learned? Well, first you cannot rely on any programming language to tell you about its limitations. Second: overflows do not only influence our results when we calculate with numbers like subtracting or adding, but go also further as seen in the comparison of two of them.
Finding only two a, b such that a-b > max(numberdomain) we cannot be sure a < b will give the expected result.
As I'm trying to figure out how that scripting language works I found myself with little documentation about datatypes. As a mathematician with interest in prime number testing I'm always interested in big integers as in java or C#. So I wrote the following hoping the loop NOT to end with an overflow:
*count := 1, *lastcount:=*count,
!while *count >= *lastcount !do
(*count).echo,
*lastcount := *count,
*count := 10*(*count)
!end
The loop really didn't end by itself throwing any error. But what I saw on my screen kind of surprised me. When I paused it I found that
1946039808 <= -1348927488
seemed to be true. So I asked in pace mode (option -p):
> i *result := (-1348927488) - 1946039808;
> i *result;
> 1000000000
So supposing a < b will be true if and only if 0 < b - a we see that really 1946039808 <= -1348927488.
What have I learned? Well, first you cannot rely on any programming language to tell you about its limitations. Second: overflows do not only influence our results when we calculate with numbers like subtracting or adding, but go also further as seen in the comparison of two of them.
Finding only two a, b such that a-b > max(numberdomain) we cannot be sure a < b will give the expected result.
Subscribe to:
Comments (Atom)