Introduction to Active Contours and Visual Dynamics
Andrew Blake,
Visual Dynamics Group,
Dept. Engineering Science, University of Oxford.
Computers have been getting better and better at seeing movement on video.
How is it that they read lips, follow a dancing girl or copy an actor
making faces?
The problem of seeing moving objects
The study of ``Computer Vision'' has been going on for about 30 years.
Early work was done as part of the effort in ``Artificial
Intelligence'' that took off in the 1970s. Researchers in pioneering
centres such as MIT, Stanford and Edinburgh intended to get computers to
reproduce human
ability to see objects, recognise them and make sense of their
movements. It all proved very much harder than anyone had anticipated.
In the 1980s the field of Computer Vision sobered up. It became clear
that a solid theoretical foundation was needed if any progress was
going to be made. Ambitions needed to be toned down too. Before
trying to replicate human abilities, it would be satisfying, and quite
hard enough, to try to emulate the visual skills of a lizard or a bee,
perhaps. It is the sheer breathtaking generality of the human ability
to see that has been so difficult to capture.
The most successful computer vision systems to date are
ones that are strongly specialised: sorting potatoes or inspecting
electrical circuit boards for instance.
Tracking agile motion:
a video sequence of a girl dancing to a Scottish reel is tracked over
about 10 seconds. Effective anticipation by the computer of likely
dance movements is crucial to enable it to ``see'' such agile
movements. This is done using the recently developed
Condensation algorithm''. See the results for yourself
in the following
movie clip
(1.1 MByte MPEG). ( Figure courtesy of M. Isard.)
A key factor in recent successes in making computers ``see''
moving objects has been in getting the computer to anticipate.
Suppose the computer is supposed to follow the trajectory of an
intruder, captured on a security video. It helps enormously if the
computer is programmed to expect a certain range of shapes
(``roundish'') for the intruder's head. That information can be
used to distinguish the true head from bits of visual
``flotsam'' in the room --- books on a shelve or pictures on the wall,
for instance. Not only shape but also movement can be anticipated by
the computer.
The intruder's
motion is expected to be smooth and largely horizontal (parallel to
the floor). The ``prior'' information can be used to get the computer to
concentrate, fixing on the moving head, without being continually
distracted by the bright and attractive pieces of flotsam nearby. This
can work successfully even when the motion is particularly vigorous,
as with the dancing girl shown above. Tracking the blowing leaf, shown next, is an
even more taxing problem because it is camouflaged --
the rest of the bush is conspiring to make things as hard as
possible by imitating the selected foreground object.
Tracking motion against camouflage:
Tracking the outline of a foreground object like the leaf allows it to be
separated automatically from the
background. It can then be re-lit using computer graphics
techniques, as shown in
the movie clip
(2.3 MByte MPEG).
This is
a special effect which previously could only be
achieved by ``blue-screening'' from specially prepared video footage.
( Figure and movie courtesy of M. Isard.)
This whole art of programming anticipation into a computer is founded,
perhaps surprisingly, on the mathematics of probability. This is
because the shape of an object like the intruder's head is not known precisely in
advance, but is constrained to lie in a certain probable range.
Similarly the motion of the intruder is not known exactly in advance;
that would beg the question that the computer vision system is
supposed to answer. It is known, however, that only movements in a
certain range are at
all likely. Quantifying that range, in a way that is digestible by a
computer program, can be done neatly using the ``language'' of probability.
Applications of the technology
Applications of the new technology in Computer Vision are multiplying.
Several examples are given here.
Actor-driven facial animation
A deforming face can be reliably tracked to relay information about the
variation over time of expression and head position to
a Computer Graphics animated face. The relayed expression can be
reproduced or systematically exaggerated. Tracking can
be accomplished in real-time, keeping pace with rate at which new
video frames arrive (50 frames per second). This allows the actor to be
furnished with valuable visual
feedback. A modicum of make-up is applied to the face to make features
stand out.
An example of real-time reanimation is illustrated for the cartoon cat
shown. This was done using two
workstations, linked by network, one for visual tracking and one for
mapping tracked motion onto the cat animation channels and display.
Actor-animated cat:
Tracked facial motions drive input channels
to a cartoon cat, programmed
with some exaggeration of expression, as shown
in the following movie clip
(1.1 MByte MPEG). ( Figure courtesy of B. Bascle.)
Human-computer interaction
Interacting with the real and the virtual:
hands tracked automatically can manipulate and indicate both real and
imaginary objects -- the so-called "digital desk"
concept. Tracking works best if different kinds of motion are
anticipated, as this
movie clip
(1.0 MByte MPEG) shows. ( Figure courtesy of M. Isard.)
Colour codes in the clip above indicated the kind of motion that was
being anticipated ( red: smooth curve; green: shading; blue
static). The method used to do this is an extension of the
Condensation algorithm to use "mixed" states. This means that, at
each instant, the algorithm reasons about which class of
motion is the current one. This helps both with anticipation,
making it more specific, and -- for free -- also classifies the motion.
Traffic monitoring
Roadside video cameras are already familiar in systems for automated speed
checks. Overhead cameras, sited on existing poles, can relay
information about the state of traffic --- its density and speed ---
and anomalies in traffic patterns. Contour tracking is particularly
suited to this task because vehicle outlines form a tightly
constrained class of shapes, undergoing predictable patterns of
motion. Already the state of California has sponsored research leading
to successful prototype systems. Work in our laboratory, monitoring
the motion of traffic along the M40 motorway near Oxford, is
illustrated here.
By automatically tracking cars, the emergency
services could obtain rapid warning of an accident
or traffic jam.
( Figure courtesy of S. Rowe, N. Ferrier, M. Isard.)
Vehicle velocity is estimated by recording the distance traversed by the base
of a tracked vehicle contour over a known elapsed time.
The measured distance is in image coordinates and this must be
converted to world coordinates to give true distance.
Analysis of speeds, in this example, showed clearly the typical
pattern of motorway traffic, with successively
increasing vehicle speeds towards the outside lane of the carriageway.
Surveillance
A computer vision system
follows an intruder on a security camera
The camera is mounted on a computer controlled
pan-tilt platform driven by visual feedback from the tracked contour.
( Figure courtesy of S. Rowe, M. Isard.)
Marker-free biometrics
Biometrics involves the measurement of
limb motion for the purposes of analysis of gait as a tool for planning
corrective surgery. The tool is also useful for ergonomic studies and
anatomical analysis in sport. It is related to
the facial animation application above, but more taxing technically.
Again, marker based systems exist and are commercially successful
as measurement tools both in biology and medicine but it is attractive to
replace them with marker-free techniques. There are also increasingly
applications in Computer Graphics for whole body animation.
Capture of the motion of an entire body from its
outline looks feasible but several problems remain to be
solved: the relatively large number of degrees of freedom of the
articulating body poses stability problems for trackers; the agility
of, say, a dancing figure requires careful treatment of ``occlusion''
--- periods during which some limbs and body parts are obscured by
others.
Tracking the articulated motion of a human body
is applicable both to biometrics and clinical gait analysis and for
actor-driven whole body animation -- see
movie clip
(0.9 MByte MPEG). ( courtesy of J. Deutscher)
Further Information
Last updated June 1999