Introduction to Active Contours and Visual Dynamics

Andrew Blake, Visual Dynamics Group, Dept. Engineering Science, University of Oxford.
Computers have been getting better and better at seeing movement on video. How is it that they read lips, follow a dancing girl or copy an actor making faces?

The problem of seeing moving objects

The study of ``Computer Vision'' has been going on for about 30 years. Early work was done as part of the effort in ``Artificial Intelligence'' that took off in the 1970s. Researchers in pioneering centres such as MIT, Stanford and Edinburgh intended to get computers to reproduce human ability to see objects, recognise them and make sense of their movements. It all proved very much harder than anyone had anticipated.

In the 1980s the field of Computer Vision sobered up. It became clear that a solid theoretical foundation was needed if any progress was going to be made. Ambitions needed to be toned down too. Before trying to replicate human abilities, it would be satisfying, and quite hard enough, to try to emulate the visual skills of a lizard or a bee, perhaps. It is the sheer breathtaking generality of the human ability to see that has been so difficult to capture. The most successful computer vision systems to date are ones that are strongly specialised: sorting potatoes or inspecting electrical circuit boards for instance.

Tracking agile motion: a video sequence of a girl dancing to a Scottish reel is tracked over about 10 seconds. Effective anticipation by the computer of likely dance movements is crucial to enable it to ``see'' such agile movements. This is done using the recently developed Condensation algorithm''. See the results for yourself in the following movie clip (1.1 MByte MPEG). ( Figure courtesy of M. Isard.)

A key factor in recent successes in making computers ``see'' moving objects has been in getting the computer to anticipate. Suppose the computer is supposed to follow the trajectory of an intruder, captured on a security video. It helps enormously if the computer is programmed to expect a certain range of shapes (``roundish'') for the intruder's head. That information can be used to distinguish the true head from bits of visual ``flotsam'' in the room --- books on a shelve or pictures on the wall, for instance. Not only shape but also movement can be anticipated by the computer. The intruder's motion is expected to be smooth and largely horizontal (parallel to the floor). The ``prior'' information can be used to get the computer to concentrate, fixing on the moving head, without being continually distracted by the bright and attractive pieces of flotsam nearby. This can work successfully even when the motion is particularly vigorous, as with the dancing girl shown above. Tracking the blowing leaf, shown next, is an even more taxing problem because it is camouflaged -- the rest of the bush is conspiring to make things as hard as possible by imitating the selected foreground object.

Tracking motion against camouflage: Tracking the outline of a foreground object like the leaf allows it to be separated automatically from the background. It can then be re-lit using computer graphics techniques, as shown in the movie clip (2.3 MByte MPEG). This is a special effect which previously could only be achieved by ``blue-screening'' from specially prepared video footage. ( Figure and movie courtesy of M. Isard.)

This whole art of programming anticipation into a computer is founded, perhaps surprisingly, on the mathematics of probability. This is because the shape of an object like the intruder's head is not known precisely in advance, but is constrained to lie in a certain probable range. Similarly the motion of the intruder is not known exactly in advance; that would beg the question that the computer vision system is supposed to answer. It is known, however, that only movements in a certain range are at all likely. Quantifying that range, in a way that is digestible by a computer program, can be done neatly using the ``language'' of probability.


Applications of the technology

Applications of the new technology in Computer Vision are multiplying. Several examples are given here.

Actor-driven facial animation

A deforming face can be reliably tracked to relay information about the variation over time of expression and head position to a Computer Graphics animated face. The relayed expression can be reproduced or systematically exaggerated. Tracking can be accomplished in real-time, keeping pace with rate at which new video frames arrive (50 frames per second). This allows the actor to be furnished with valuable visual feedback. A modicum of make-up is applied to the face to make features stand out. An example of real-time reanimation is illustrated for the cartoon cat shown. This was done using two workstations, linked by network, one for visual tracking and one for mapping tracked motion onto the cat animation channels and display.

Actor-animated cat: Tracked facial motions drive input channels to a cartoon cat, programmed with some exaggeration of expression, as shown in the following movie clip (1.1 MByte MPEG). ( Figure courtesy of B. Bascle.)







Human-computer interaction

Interacting with the real and the virtual: hands tracked automatically can manipulate and indicate both real and imaginary objects -- the so-called "digital desk" concept. Tracking works best if different kinds of motion are anticipated, as this movie clip (1.0 MByte MPEG) shows. ( Figure courtesy of M. Isard.)




Colour codes in the clip above indicated the kind of motion that was being anticipated ( red: smooth curve; green: shading; blue static). The method used to do this is an extension of the Condensation algorithm to use "mixed" states. This means that, at each instant, the algorithm reasons about which class of motion is the current one. This helps both with anticipation, making it more specific, and -- for free -- also classifies the motion.


Traffic monitoring

Roadside video cameras are already familiar in systems for automated speed checks. Overhead cameras, sited on existing poles, can relay information about the state of traffic --- its density and speed --- and anomalies in traffic patterns. Contour tracking is particularly suited to this task because vehicle outlines form a tightly constrained class of shapes, undergoing predictable patterns of motion. Already the state of California has sponsored research leading to successful prototype systems. Work in our laboratory, monitoring the motion of traffic along the M40 motorway near Oxford, is illustrated here.

By automatically tracking cars, the emergency services could obtain rapid warning of an accident or traffic jam. ( Figure courtesy of S. Rowe, N. Ferrier, M. Isard.)





Vehicle velocity is estimated by recording the distance traversed by the base of a tracked vehicle contour over a known elapsed time. The measured distance is in image coordinates and this must be converted to world coordinates to give true distance. Analysis of speeds, in this example, showed clearly the typical pattern of motorway traffic, with successively increasing vehicle speeds towards the outside lane of the carriageway.




Surveillance

A computer vision system follows an intruder on a security camera The camera is mounted on a computer controlled pan-tilt platform driven by visual feedback from the tracked contour. ( Figure courtesy of S. Rowe, M. Isard.)





Marker-free biometrics

Biometrics involves the measurement of limb motion for the purposes of analysis of gait as a tool for planning corrective surgery. The tool is also useful for ergonomic studies and anatomical analysis in sport. It is related to the facial animation application above, but more taxing technically. Again, marker based systems exist and are commercially successful as measurement tools both in biology and medicine but it is attractive to replace them with marker-free techniques. There are also increasingly applications in Computer Graphics for whole body animation. Capture of the motion of an entire body from its outline looks feasible but several problems remain to be solved: the relatively large number of degrees of freedom of the articulating body poses stability problems for trackers; the agility of, say, a dancing figure requires careful treatment of ``occlusion'' --- periods during which some limbs and body parts are obscured by others.

Tracking the articulated motion of a human body is applicable both to biometrics and clinical gait analysis and for actor-driven whole body animation -- see movie clip (0.9 MByte MPEG). ( courtesy of J. Deutscher)








Further Information


Last updated June 1999