Automation with visual recognition

Automation with visual recognition

Automation is not a new topic, with most software development QA teams employing its use in one way or another. There is also no lack of tools to choose from.

On desktop, there are the ever popular Selenium and the HP backed HP – UFT (formerly QTP). For mobile, Appium and MonkeyTalk are among the more frequently used solutions.

All of these tools are fine choices for functional and data driven tests due to their object-oriented nature. However, in my experience, there is one type of automation that is seldom mentioned, visual based testing using OCR (Optical Character Recognition) technology.

What is Visual Automation?

Visual automation relies on the appearance of on-screen elements to perform an action. This is different from traditional automation, which relies on the occurrence of elements in the background resources. To accomplish this, a set of pre-defined or determined visual images and/or transitions are stored. Scripts are written to compare the stored images to the current screen appearance in a set sequence to ensure the application is running through the expected on-screen transitions.

Actions can also be scripted in response to on-screen changes. For example, the tools would check for the appearance of a login screen and compare its appearance to the expected result. If the screen matches the expected result, the tool would fill in the user name and password fields by mimicking mouse clicks and keyboard strokes.

Visual automation tools not only watch the screen for the appearance of specific elements but they can also act on element transitions, the disappearance of elements, or elapsed time. Actions against these on-screen elements mimic human actions. The tools can attempt to perform functions such as clicking, double-clicking, dragging and dropping, filling forms, etc.

The range of action is at the full extent of what humans can do. There are several tools currently available to perform visual automation, including Squish and my favorite, Sikuli.

Why Visual Automation?

Visual automation acts much closer to human behavior than object-oriented automation tools. The actions and reactions are only based on visual stimuli to which humans can react. This allows testing to be conducted in a way that is much closer to the human experience than any other type of automation. Consider the following examples:

In the case above, a real human end user would have issues with the page but automated tools would have no trouble finding the login button as long as only the front-end graphic is missing.

The above test would pass when using object-oriented automation where the tool is used to find if an element exists without considering its proper placement whereas if visual automation is utilized, the defect would be properly identified.

The above scenarios are only a couple examples from a long list of scenarios where an automation tool that behaves similarly to a human user would be more useful.

Another advantage of an OCR-based automation tool is that it is not bound to an application while some other tools have limited access or even no access to the system outside of the application being tested. Visual automation tools can watch the entire screen for any change regardless of source. This way, it is possible to launch multiple unrelated applications and watch for their interactions. It is also possible, if one were to be inclined to do so, to launch a virtual machine and then launch multiple applications within it, with all of them under the control of a single automation tool. It can be quite powerful under the right circumstances.

The Case Against Visual Automation Tools

Visual automation also has some glaring disadvantages. If it didn’t, it would be much more widespread.

Firstly, it is not well suited for repetitive fast-paced testing. This is typical in a stress test scenario. Due to the nature of human user mimicry, this automation waits for the application to fully load and respond before proceeding. Therefore, testing time is usually much longer than with object-oriented automation. As a secondary effect of this, visual automation is also ill-suite for fast data verification. It is possible to run through a set of data (possibly stored in a spreadsheet or csv) but it would be much more time consuming than with object-oriented automation tools.

Secondly, it can’t handle multiple instances of the same application being tested. This type of automation watches the monitor for predetermined screens to show up. If multiple instances of the same or even similar screens appear at the same time, it can quickly become confusing. This is an unfortunate side effect of the ability to watch the entire system screen rather than just the single application.

Lastly and maybe most importantly, there is potentially a higher maintenance cost. Due to the fact that expected results need to be stored and updated, there would be a much higher human involvement in the maintenance of the comparison banks. Every change to the visual look would require capturing and restoring the new expected result. Even a change in transition would require script updates. Now, of course, the usual tricks of modulation and function extractions would work but this only reduces labor without eliminating it.

Opening New Doors

In the world of automation, visual automation (OCR-based) tools are often overlooked even though there are plenty of scenarios where they could offer a superior solution. By their nature of behaving closer to human end users, they can catch errors that would be overlooked by object-oriented tools. Having system wide influence can also open new doors in automation.

Yes, there are indeed several glaring shortcomings in visual automation, as I mentioned above, but I’m not saying other tools are not needed or that any tool should be used in exclusivity. For any serious automation of testing, a QA manager should evaluate all available tools and utilize any and all tools to their strengths. I just don’t want you to miss out on OCR tools and the advantages they offer.

Find out how we can work together to bring your ideas to life.

Find out how we can work together to bring your ideas to life.

Thank you for contacting us