Interview by Lars Frantzen at ICST, Berlin, March 21, 2011
Dr. Wolfgang Grieskamp has worked for Google Inc. on engineering productivity and testing tools since April 2011. Before that he was a Principal Architect in the Microsoft Windows division, helping ensure the quality of Windows client/server and server/server protocol documentation. He devised a testing approach which convinced decision-makers and which was applied in +300 person-years of testing, more than half of which using model-based testing with SpecExplorer. Before that he was at Microsoft Research, where he developed a series of model-based testing tools, leading to SpecExplorer 2010, a model-based testing solution integrated into Visual Studio.
Dr. Grieskamp has recently switched to Google, but gave this interview while he was still at Microsoft.
Model-Based Testing Community: How did you come to model-based testing?
Wolfgang Grieskamp: Actually through the backdoor!
MTC: What exactly does that mean?
WG: When I started at Microsoft Research in 2001, there was a modeling language called AsmL which was being developed in the group of Yuri Gurevich. It was based on Abstract State Machines. A very nice language, ahead of its time in some ways. But we couldn’t really find any application for it since the idea was modeling systems just for the sake of modeling. That wasn’t exactly the killer application to be sold within Microsoft. Also at that time Tony Hoare gave a couple of talks. He had joined Microsoft a little earlier.
MTC: He is at Microsoft Research in Cambridge, isn’t he?
WG: Right. He told research folks about the huge testing problem which companies like Microsoft had, and so we thought that maybe testing would be a good application for modeling. So we just took the existing implementation of AsmL, which was able to explore various non-deterministic execution paths. The language included some choice constructs for non-determinism. Therefore, in order to explore the possible paths, this functionality was already part of the implementation. Nikolai Tillmann and I played around with it for two months and turned it into a testing tool. So that was the beginning. And it sold pretty well since there were very open-minded people at Microsoft. We were in touch with them before we even had finished the tool, so they could influence the development and get it the way they wanted it. The first thing we did was to apply it to some Web-service-based testing.
MTC: Today you have talked about SpecExplorer. Is there still something left from the original AsmL approach in SpecExplorer, or is it a completely different system?
WG: In the latest version of SpecExplorer AsmL is no longer existent. First there was the AsmL tool, which we revamped it into the first version of what is now called SpecExplorer. The current third version of SpecExplorer is no longer AsmL–based, but based on .NET, so it is working on the intermediate CIL language of the .NET framework. In this sense it could work on any language, be it AsmL, C#, or VB.
MTC: This intermediate CIL language is basically bytecode?
MTC: We are already deep in the topic of modeling – excellent! One of the main characteristics of SpecExplorer is that you can use a common programming language for modeling, so that you don’t have to learn a new dedicated one, like AsmL. How important would you consider this feature for the success of a modeling and testing framework?
WG: When we transferred from version 1 of SpecExplorer to version 2, one thing we did was to interview our customers and ask them what they would prefer. For instance, one question was: ‘Would you prefer to lose some of the nice features which AsmL has to offer in exchange for getting a mainstream language with full IDE integration?’ This integration is called IntelliSense in the Microsoft world. Such an integration comprises code completion, structuring, navigation, etc. We got a clear answer – our customers preferred the IntelliSense way. So we followed their wishes and moved from modeling in a domain-specific language (DSL) to a well-known mainstream language.
MTC: But these days DSLs are again very popular, such as the Eclipse Xtext framework, which generates a full-blown editor and model transformation support based on a given DSL grammar.
WG: I think there’s no silver bullet. If you go for the mainstream language approach, you definitely avoid some of the devils you‘d have to fight when coming up with a new language. By focusing on the functionality of the model-based testing instead of the modeling language, it’s possible to decouple different concerns. That’s the big advantage, but it comes with the disadvantage that you may lose some of the abstraction capabilities, in particular when it comes to domain-specific areas, for instance protocols with certain communication mechanisms. You can design a language with channel-based communication like CSP, and it may in fact be much nicer to model your protocol problem with a language tailored to this domain. I wouldn’t say there’s a general recommendation.
And if you look at the level of sophistication of editing support in modern IDEs, I doubt that you’d ever get to the same level by just defining a grammar in a generic DSL toolkit.
MTC: Another important difference is whether you model textually or graphically. Would you like to include graphical models in SpecExplorer, as Conformiq and Smartesting have them?
WG: We’re experimenting with UML notations for SpecExplorer because in its core it’s language-agnostic and you could basically combine both types. But we didn’t really meet with large interest in the community we dealt with at Microsoft, which consists of software developers and test engineers. Everyone has a software development background. For those people graphical notations don’t really matter. Instead, what would matter a lot to them is being able to enter a textual notation which is then converted into a graphical notation, so that, for example, sequence charts could be generated which represent test cases. That would be very handy.
UML is very well suited for communicating models, but as a programmer you don’t really need that, you’re more used to textual languages and you’re probably also more productive using these. On the other hand there are domains in which, for instance, business analysts work. Usually they lack a programming background. For those people graphical models may be helpful to work more productively. I know various applications in industry in which scenario-oriented or use-case-oriented modeling is used by business analysts to derive models from customer requirements. They do so at a relatively high level. In such cases I think graphical notations are very useful.
MTC: Another common differentiation is between offline and online (aka on-the-fly) testing. Today you have mentioned that online testing may appear almost magical, at least to the tester. Is that a rather subtle criticism of online testing?
WG: We support both. But in our experience a test algorithm which is based on some degree of randomness – even if it is stochastically well understood that the randomness will be kind of compensated when you do it long enough – is very hard to sell to practitioners. Also people want to inspect the generated test cases. That sounds crazy, almost as in the old times, when the first compilers came up and people wanted to inspect the generated assembler code. Today that doesn’t sound very reasonable, but it’s still quite common in the testing world.
Another point with online testing is that it’s often relatively unpredictable how long it will run, at least if you aim at specific coverage metrics. Testing is often tied to a software development procedure. Developers check in code at a repository, then the sources are compiled at the server site and the corresponding test cases are executed. In this case the testing procedure must be highly predictable with respect to the time it takes.
I think both kinds of testing should be combined, ideally based on the same models, maybe with some slight modifications and instrumentations. At Microsoft we use models for deriving test cases for the so-called basic verification tests, which take place when code is checked in. We use the same models to do longer, for example, overnight lap-based testing. In these cases online, that is, on-the-fly techniques are used. Stress testing is done via online approaches as well.
MTC: Ok, after being quite technical already, let’s take a step back. What’s your specific task within the Microsoft test team?
WG: My current role is what they call a ‘software architect’ at Microsoft, which is a software developer who is involved in writing code for the tools used for testing, in my case. However, the focus is on the overall architecture and design of a process or product, or a tool which is used to produce a product. So it is really a combination of being a developer on the one hand and having technical responsibility for the design strategy on the other hand. The past five years my job was split into designing a testing process for the Microsoft Protocol Documentation Program (see here, here, and here), which I have talked about today. I did this together with other people like Keith Stobie and Nico Kicillof. In this project we had to create from scratch an organization which had to grow very fast to solve this documentation testing problem. The other task was designing and continuing to develop the SpecExplorer testing tool. Other projects came out of that, such as model-driven development for protocols. I can’t tell a lot about it since it’s still confidential, but it’s basically the follow-up of the previous stuff.
MTC: That sounds interesting. So here we are at ICST, which has become quite a huge conference. Plus there are dozens of testing workshops nowadays. Obviously testing has definitely reached academia. What about industry? Does testing research get more attention, or is this rather an academic hype?
WG: It’s absolutely no hype, and I’m actually quite happy to see this. I think research has neglected testing for quite some time. As a consequence big corporations which invest a lot in testing are sometimes more advanced than academics. It’s still hard to find a university course about testing which covers all the state-of-the-art expertise which industry has. It may sound a little provocative, but it’s good that in academia the importance of testing has finally been recognized. The number of conferences and workshops in the testing domain is growing, and ever more researchers and PhDs deal with the topic and teach it to students. 50% of software development costs go into quality assurance, and the tendency is rising. If that’s so very important for software engineering, we really should take some action to improve the situation. I don’t know the exact numbers, but most people who study computer science and who later work in industry will, to their surprise, be asked to test software rather than to develop it. And in most cases they haven’t learned anything, or at least very little, about it.
MTC: One way to try to bridge this gap in Europe is via certification. At least in some European countries this is quite a success now, as the ISTQB certifications demonstrate. You attend a seminar for a couple of days and then do a multiple choice test on a certain testing topic, starting with the basics. Nowadays, at least in Germany, you’ll be asked for this certification in nearly every testing-related job offer. Is there something similar in the US?
WG: A lot happens inside the companies, once people have joined corporations like Microsoft. They will go to lots of trainings, like productivity engineering and other educational programs. Probably these trainings cover similar contents as the certifications you mentioned. I haven’t really heard about external certification programs in the testing domain, but it’s a great idea – they should import it from Europe!
MTC: Let’s get back to MBT. Have MBT tools reached industrial strength yet?
WG: Not really, but they’re quite close to it.
MTC: What’s missing?
WG: There’s a lack of commitment from companies like Microsoft to actually make such tools part of their core product. To be very frank about it, if you look at SpecExplorer, it’s a so called Power Tool for Visual Studio, which means it’s not an actual part of Visual Studio. Looking at the licensing conditions, you’ll see that these Power Tools are freely available. However, they don’t have the maintenance guarantees you get when buying a fully licensed version of Visual Studio – which will be maintained for seven years or something like this. That’s a big obstacle.
MTC: So SpecExplorer is a kind of add-on for Visual Studio?
WG: Correct. Sometimes such add-ons become part of the core, which is the important step. The problem here is that the technology is rather sophisticated. Both SpecExplorer and Pex are developed at Microsoft Research and are based on the theorem prover Z3. It’s pretty hard to get rocket science like this into a product group which in turn will always need the backup of Microsoft Research. If you look at other players, that is, smaller companies, the situation is pretty much the same, although they have a genuine commercial product, which SpecExplorer is not. These companies are small, which makes it a risk for a big company to commit to using their tools since the small player might go out of business. I, just like more and more people at Microsoft, believe that the best way to make MBT tools a success is to make them open source. This would enable companies to maintain these tools themselves if they’re no longer maintained by the original developers. This would help a lot to make these technologies more widely used.
Another very important aspect is that more standards for MBT are needed. Right now there’s a series of standards coming up from ETSI, the folks who also standardized TTCN-3. The first standard of the series deals with concepts of MBT to ensure that people are on the same page, since there are so many different notions of MBT.
MTC: They’re fixing the terminology and concepts for MBT?
WG: Right. The next step, which is still under discussion, might be to create an exchange format, like common semantics for MBT tools. With that you would no longer be bound to one particular tool provider.
MTC: That involves a common modeling format, or is the focus more on something like XMI for MBT?
WG: We might call it a ‘common metamodel’, as it is called in the UML world. Formerly this was just called an exchange format. The aim is to be able to write a model in tool A and then move it to tool B. That’s very challenging. Quite a number of parties are involved in this initiative, among them Conformiq, Microsoft, Nokia, and Siemens.
In my opinion these are the main strategies to improve acceptance of MBT. You need to give people the security that they can rely on the technology in a long-term perspective, based on an accepted terminology and format.
MTC: MBT often seems to be considered a silver bullet for testing issues. Where do you think it is best applied, and where not?
WG: It’s well applicable whenever you’re interested in systematic functional coverage, like requirements coverage, which I have discussed this morning. Nothing beats it at this, I think. We understand well how MBT is used in the functional or behavior domain, like complex contracts on modalities of your system, as typically found in network protocols. But you’ll often also encounter MBT when testing user interfaces and various other applications. It also plays an important role in data-intensive applications, where the answer of the SUT to a certain query is computed by a non-trivial algorithm which depends on the current state of the system. Here MBT can also be a very promising approach.
MTC: Besides adding data to the models, adding time is an important topic. Even most of today’s A-MOST talks have dealt with that. Are timing aspects really as important as it currently appears in research? Should time be in the model or in a dedicated extra model?
WG: When we dealt with network protocols at Microsoft, we were of course confronted with time constraints. We figured out that most of the timing problems could be dealt with through modeling timers and putting them into the testing adapter. Abstract time, like expiring timers, was considered as events or outputs. That worked for network protocols where most of the timing problems are basically timeouts. Though when treating embedded systems with real time, it may be too cumbersome to simulate time like that. In these cases time probably should be a first-class citizen in your modeling notation.
MTC: Like a clock variable?
WG: Yes, like that. There’s the work of the UPPAAL group, and it’s very clear that test generation depends upon time in many cases. Sometimes it can be simulated via abstract events, but in many cases you need to incorporate it fully in your algorithm. Tools like Conformiq already do that.
MTC: What do you think are the main challenges for MBT in the future? You‘ve already mentioned industry acceptance.
WG: I still think it’s acceptance, so we need to find a way that people can rely on modeling notations, tools, and approaches as they do on other mainstream technologies. I get the impression that it’s no longer an issue of convincing people of the effectiveness of MBT. Still there are many little challenges concerning the details of the tooling, how to make it convenient and workable. Like the test adapter problem, which is always an interesting one. Often the adapter takes more time to develop than the model itself does. This is one thing we should work on. Another problem is that everybody has his or her own test control management system. And there’s still the challenge of what you worked on, too – we still don’t have the push-button technology for defining a model with an open environment in which you have symbolic parameterized actions. Now we want to push a button to create the minimal instantiation of concrete parameters, so that, for example, each transition is covered. This is basically what Pex does for one transition since it’s close to a unit test tool. Could that be done for a full transition system? We’ve been working on that for a long time, it heavily depends on the power of the underlying constraint solver. We now do all the slicing which works quite nicely via basically instrumenting the models for test selection, and we want to continue in that direction. But especially for analyzing and viewing the model we’d like to have this push-button approach giving the minimal set of test cases. There are many strategies and heuristics these days for doing the symbolic exploration, but I think the future is mainly in symbolic computation, like using SMT solvers, possibly combined with heuristic or search-based approaches. Right now, for instance, we’re working on combining bounded model checking with symbolic exploration: You create a bounded finite system, then you instantiate all variables, and in the end you collapse it together into a finite state machine. Embedding knowledge you have about the program itself, for instance, a symbolic analysis of the source code, may be a promising candidate to be integrated into the testing process as well.
MTC: Thanks a lot, Wolfgang, for this interesting interview!