Monday, 15 November 2010

Off the beaten path - Test Automation that can vary (and learn)...


The problems with most software test automation is essentially twofold: for the most part, it's only capable of following the same path, over and over again, and, it's incapable of ever learning. It's as dumb today as it was yesterday. That said, it's just not as classy as it could be.

What, it can be asked, is such repetitive test automation actually useful for? Well, most importantly, for verifying that something doesn't get broken as new features get added to our software; we opt to run our tests to give us a sense of confidence in what's already there.

Of interesting note, though, is the fact that we nearly always need to pair our automated tests with hands-on exploratory testing. Except that it never really happens like that. The hands-on exploratory testing gradually gives way to the Manual Monkey Test as the testers and managers lose confidence in those unique skills of perception and hunches that make us human and instead seek something quantifiable, repeatable and reproducible.





So how can we turn this around? Boredom with repetitive work begets automation; automation begets need to exploratory test which somehow though time pressures turns into more of the same....in the meantime, the defects to be found live "somewhere" off the path of this tedious nightmare.

There are no complete solutions presented here, but I hope you might get some idea of the possibilities available.


Well, one thing to do is to make our test automation more intelligent. Exploratory testing essentially involves two skills, before thought to be uniquely the domain of homo-sapiens - 1.) the ability to randomize inputs and 2.) the ability to assess and learn if a given output is appropriate given a non-linear input. Let me elaborate.

One of the great benefits of open-source software on test automation is (aside from not being bound to a proprietary language) the ability to leverage the wealth of extension libraries that come with the open source languages, and you're supported for free by a massive user community of volunteers. Tools like Squish and Selenium use Python (among others); Nokia's TDriver uses Ruby.

Without going into too much low level detail ;-), both Python and Ruby have randomizer functions. In Python, one can use the rand()function with an integer as an argument, for example rand(7) returns any number between 0 and 6; Ruby has exactly the same thing. Here's a recipe in Ruby for a simple sort and shuffle of a deck of cards, producing n log n variable swaps:









Now, this won't be the fastest way if you're dealing with a very large list, but for generating a small set to explore alternative paths, it could well suffice.

Now this randomizing can apply to just about any input that comes as an index - so this means just about anything: selected items in a list, buttons in toolbars, parts of a string or a numerical sequence. Once we are randomizing, we are dealing with inputs. But, how to know when to randomize?

Most users out there are going to use a software application in a given or fairly static way - just enough steps to suit their purpose. Configure the music player, cue it up and play the song. But every ONCE in a while, they might deviate from that sequence. So knowing when to randomize, means knowing how often, based on probability, to do something completely different.

By way of a practical example....Cucumber has become a popular way to express automatic tests in plain language. A cucumber test may read something like this:

Given the MusicPlayer application is started
When I press Options softkey
When I select Songs from menu
And I Select the Song number 1
Then Now playing view is opened correctly

Where each line in that block relates to a function implemented in the core language below it, for example:










Once we pass a certain threshold (e.g. the function has been accessed n times, we are twice or three times likely to deviate from our original number and choose a random index). Doing this effectively requires a trick though - the ability to visualize a software system as a state machine in 4D. And if we randomize at the function level, our cucumber test writer gets the benefit of that randomization without having to do anything differently

Once our scripts start to randomize, however, the fixed answer sets to our test runs will no longer suffice; our tests will require the ability to be able to be trained, to learn, and to guess based on previous history as per a correct answer. Fortunately, our open-source languages have the tools to allow us to be able to do just that.


Bayesian gems are already in common use in email spam filtering. Python, for example offers the PEBL and Classifier libraries; Ruby offers Bishop and Classifier gems. Training for the Classifier gem works along the lines as follows (note - NOT TESTED):

require 'rubygems'

require 'classifier'

classifier = Classifier::Bayes.new('Song', 'Not_Song)

classifier.train_Song('%r{.+\.(?i)mp3}')

classifier.train_Song('%r{.+\.(?i)3gp}')

...

classifier.train_Not_Song('%r{.+\.(?i)jpg}')

...

And the good stuff - where the real demonstration of learning happens - is here:

classifier.classify "bubba_chops.jpg"

#=>"Not_Song"

classifier.classify "song.3gp"

#=>"Song"

classifier.classify "song.m3u"

#=>"Song"

In the case where the script gets it wrong - you have to train it. But after more and more iterations - it will start to make the correct decisions in more and more of the cases.

This is where your script can think, and make decisions as you do.

And that's no monkey business.







2 comments:

  1. amazing description of a novel way to do automated testing. Have you thought of describing how you put these 2 concepts together to test an arbitrary method?

    How about the results of this experiment? Did you find bugs you would not have otherwise?

    ReplyDelete
  2. Vasco, I hadn't noticed your comment until now, being only an incidental blogger, so please pardon my delayed response. The answers are: Yes (sort of)! and Yes!

    It's possible, for example to create a PyQt application (test-automated with TDriver) in which one of the underlying Python classes in the SUT application performs metaprogramming on itself, changing based on some input (this isn't exactly arbitrary but maybe displays some similar results to what you speak). Said in Robert Stack's voice: "More on this story in a future update" . Admittedly, though, I've only worked with fairly static stuff until now.

    As for finding bugs off the beaten path: you bet! However, the system needs some extra effort invested in its training as it tends to point to what it thinks are bugs based on incorrect decisions. Also Bayesian analysis is a little less mature in the gem that the TDriver supported Ruby supports (is that a sentence?), but there are other gems that work similarly.

    ReplyDelete