Sunday, December 21, 2008


The relationship between the content and
the geometry of a photograph, and the
difficulty of separating them for analysis, has
caused anguish, or at least sustained puzzlement,
in more than a few writers. Roland Barthes,
for example, considered photography more or
less unclassifiable because it “always carries its
referent with itself” and there is “no photograph
without something or someone.”
As a philosophical issue, this applies to
the finished image and to reverse readings of
photographs, but in the context of making a
photograph, matters tend to be simplified by
knowledge of the task at hand. At some point
in the making of almost every photograph, the
photographer knows what the subject should
be and is solving the problem of how best to
make it into an image.
Content is the subject matter, both concrete
(objects, people, scenes, and so on) and abstract
(events, actions, concepts, and emotions). The
role it plays in influencing the design is complex,
because it has a specific attention value. Moreover,
different classes of subject tend to direct the
shooting method, largely for practical reasons. In
news photography, the fact of an event is the crucial
issue, at least for the editors. It is possible to shoot at
a news event and treat it in a different way, perhaps
looking for something more generic or symbolic,
but this then is no longer true news photography.
And if the facts rule the shooting, there is likely
to be less opportunity or reason to experiment
with individual treatments. Strong content, in other
words, tends to call for straight treatment—practical
rather than unusual composition.
Perhaps at this point the following tale from
British photographer George Rodger (1908-
1995), a co-founder of Magnum, would not be
out of place, even though fortunately most of
us will never find ourselves in such an extreme
situation. At the end of the Second World War,
Rodger entered Belsen concentration camp with
Allied troops. He later said, in an interview,
“When I discovered that I could look at the
horror of Belsen—4,000 dead and starving lying
around—and think only of a nice photographic
composition, I knew something had happened
to me and it had to stop.”


How people look at images is of fundamental
importance to painters, photographers,
and anyone else who creates those images. The
premise of this book is that the way you compose
a photograph will influence the way in which
someone else looks at it. While this is tacitly
accepted throughout the visual arts, pinpointing
the how and the why of visual attention has been
hampered by lack of information. Traditionally,
art and photography critics have used their
own experience and empathy to divine what
a viewer might or should get out of a picture,
but it is only in the last few decades that this
has been researched. Eye-tracking provides the
experimental evidence for how people look at
a scene or an image, and the groundbreaking
study was by A. L. Yarbus in 1967. In looking
at any scene or image, the eye scans it in fast
jumps, moving from one point of interest to
another. These movements of both eyes together
are known as saccades. One reason for them
is that only the central part of the retina, the
fovea, has high resolution, and a succession of
saccades allows the brain to assemble a total view
in the short-term memory. The eye’s saccadic
movements can be tracked, and the so-called
“scanpath” recorded. If then superimposed on
the view—such as a photograph—it shows how
and in what order a viewer scanned the image.
All of this happens so quickly (saccades last
between 20 and 200 milliseconds) that most
people are unaware of their pattern of looking.
Research, however, shows that there are different
types of looking, depending on what the viewer
expects to get from the experience. There is
spontaneous looking, in which the viewer is “just
looking,” without any particular thing in mind.
The gaze pattern is influenced by such factors as
novelty, complexity, and incongruity. In the case
of a photograph, the eye is attracted to things
that are of interest and to parts of the picture that
contain information useful for making sense out
of it. Visual weight, as we saw on the previous
pages, plays an important role; this is because
spontaneous looking is also influenced by “stored
knowledge,” which includes, among other things,
knowing that eyes and lips tell a great deal about
other people’s moods and attitudes.
A second type of looking is task-relevant
looking, in which the viewer sets out to look for
something or gain specific information from
an image or scene. In looking at a photograph,
we can assume that the viewer is doing this
by choice, and probably for some kind of
pleasure or entertainment (or in the hope that
the photograph will deliver this). This is an
important starting condition. Next come the
viewer’s expectations. For instance, if he or
she sees at first glance that there is something
unusual or unexplained about the image, this is
likely to cause a gaze pattern that is searching for
information that will explain the circumstances.
The classic study was by Yarbus in 1967, in which
a picture of a visitor arriving in a living room
was shown first without any instructions, and
then with six different prior questions, including
estimating the ages of the people in the image.
The very different scanpaths showed how the
task influenced the looking.
Other research in this area shows that most
people tend to agree on what are the most
informative parts of a picture, but that this is always
tempered by individual experience (personal stored
knowledge makes scanpaths idiosyncratic). Also,
most painters and photographers believe that they
can in some way control the way that other people
view their work (this is, after all, the entire theme of
this book), and research backs this up, in particular
an experiment (Hansen & Støvring, 1988) in
which an artist explained how he intended viewers
to look at the work and subsequent eye-tracking
proved him largely correct. Another experiment
with interesting potential is that the scanpath that
emerges at first viewing occupies about 30% of the
viewing time, and that most viewers then repeat
it—re-scanning the same way rather than using
the time to explore other parts of the picture. In
other words, most people decide quite quickly
what they think is important and/or interesting
in an image, and go on looking at those parts.


So obvious as to be a truism is that we look
most at what interests us. This means that
as we start to look at anything, whether a real
scene or an image, we bring to the task “stored
knowledge” that we have accumulated from
experience. Recent research in perception
confirms this; Deutsch and Deutsch (1963)
proposed “importance weightings” as a main
factor in visual attention. This is crucial in
deciding how photographs will be looked
at, because in addition to the composition,
certain kinds of content will do more than
others to attract the eye. Of course, filtering
out idiosyncrasy is difficult, to say the least, but
there are some useful generalizations. Certain
subjects will tend to attract people more than
others, either because we have learned to expect
more information from them or because
they appeal to our emotions or desires.
The most common high-attractant subjects
are the key parts of the human face, especially
the eyes and mouth, almost certainly because this
is where we derive most of our information for
deciding how someone will react. In fact, research
into the nervous system has shown that there are
specific brain modules for recognizing faces, and
others for recognizing hands—clear proof of how
important these subjects are visually.
Another class of subject that attracts the
eye with a high weighting is writing—again,
something of obvious high-information value.
In street photography, for example, signs and
billboards have a tendency to divert attention,
and the meaning of the words can add another
level of interest—consider a word intended to
shock, as is sometimes used in advertising. Even
if the language is unknown to the viewer (for
example, the image on page 42 for any non-
Chinese speaker), it still appears to command
attention. Ansel Adams, on the subject of a
photograph of Chinese grave markers, wrote,
“Inscriptions in a foreign language can have
a direct aesthetic quality, unmodified by the
imposition of meaning,” but the very fact
that they had any visual quality was because
they represented a language.
As well as these “informational” subjects,
there is an even wider and harder-to-define class
that appeals to the emotions. These include sexual
attraction (erotic and pornographic images),
cuteness (baby animals and pets, for example),
horror (scenes of death and violence), disgust,
fashion, desirable goods, and novelty. Reactions
in this class depend more on the individual
interests of the viewer.
There is no way of accurately balancing
all of these weightings, but on an intuitive level
it is fairly easy, as long as the photographer is
conscious of the various degrees of attraction.
All of this content-based weighting also has to be
set against the complex ways in which the form
of the image—the graphic elements and colors—
directs attention.


One of the paradoxes of vision is that while
the image projected onto the retina obeys
the laws of optics and shows distant objects
smaller than nearer ones, the brain, given
sufficient clues, knows their proper size. And, in
one view, the brain accepts both realities—distant
objects that are small and full-scale at the same
time. The same thing happens with linear
perspective. The parallel sides of a road stretching
away from us converge optically but at the same
time are perceived as straight and parallel. The
explanation for this is known as “constancy
scaling” or “scale constancy,” a little-understood
perceptual mechanism that allows the mind to
resolve the inconsistencies of depth. Its impact on
photography is that the recorded image is purely
optical, so that distant objects appear only small,
and parallel lines do converge. As in painting,
photography has to pursue various strategies to
enhance or reduce the sense of depth, and images
work within their own frame of reference, not
that of normal perception.
Photography’s constant relationship with
real scenes makes the sense of depth in a picture
always important, and this in turn influences
the realism of the photograph. In its broadest
sense, perspective is the appearance of objects in
space, and their relationships to each other and
the viewer. More usually, in photography it is
used to describe the intensity of the impression
of depth. The various types of perspective and
other depth controls will be described in a
moment, but before this we ought to consider
how to use them, and why. Given the ability
to make a difference to the perspective, under
what conditions will it help the photograph to
enhance, or to diminish, the sense of depth?
A heightened sense of depth through strong
perspective tends to improve the viewer’s sense
of being there in front of a real scene. It makes
more of the representational qualities of the
subject, and less of the graphic structure.
The following types of perspective contain
the main variables that affect our sense of depth
in a photograph. Which ones dominate depends
on the situation, as does the influence that the
photographer has over them.
In two-dimensional imagery, this is, overall, the
most prominent type of perspective effect. Linear
perspective is characterized by converging lines.
These lines are, in most scenes, actually parallel,
like the edges of a road and the top and bottom
of a wall, but if they recede from the camera,
they appear to converge toward one or more
vanishing points. If they continue in the image
for a sufficient distance, they do actually meet at
a real point. If the camera is level, and the view is
a landscape, the horizontal lines will converge on
the horizon. If the camera is pointed upward, the
vertical lines, such as the sides of a building, will
converge toward some unspecified part of the sky;
visually, this is more difficult for most people to
accept as a normal image.
In the process of convergence, all or most
of the lines become diagonal, and this, as we’ll
see on pages 76-77, induces visual tension and
a sense of movement. The movement itself adds
to the perception of depth, along lines that
carry the eye into and out of the scene. By
association, therefore, diagonal lines of all kinds
contain a suggestion of depth, and this includes
shadows which, if seen obliquely, can appear as
lines. So a direct sun, particularly if low in the
sky, will enhance perspective if the shadows it
casts fall diagonally. Viewpoint determines the
degree of convergence, and the more acute the
angle of view to the surface, the greater this is—
at least until the camera is close to ground level,
at which point the convergence becomes extreme
enough to disappear.
The focal length of lens is another important
factor in linear perspective. Of two lenses aimed
appropriate place in the scene, it helps to establish
perspective. Also associated with diminishing
perspective are placement (things in the lower
part of the picture are, through familiarity,
assumed to be in the foreground) and overlap
(if the outline of one object overlaps another,
it is assumed to be the one in front).
directly towards the vanishing point of a scene,
the wide-angle lens will show more of the
diagonals in the foreground, and these will tend
to dominate the structure of the image more.
Hence, wide-angle lenses have a propensity to
enhance linear perspective, while telephoto lenses
tend to flatten it.
This is related to linear perspective, and is in
fact a form of it. Imagine a row of identical
trees lining a road. A view along the road would
produce the familiar convergence in the line
of trees, but individually they will appear to
be successively smaller. This is diminishing
perspective, and works most effectively with
identical or similar objects at different distances.
For similar reasons, anything of recognizable
size will give a standard of scale; in the
appropriate place in the scene, it helps to establish
perspective. Also associated with diminishing
perspective are placement (things in the lower
part of the picture are, through familiarity,
assumed to be in the foreground) and overlap
(if the outline of one object overlaps another,
it is assumed to be the one in front).
Atmospheric haze acts as a filter, reducing the
contrast in distant parts of a scene and lightening
their tone. Our familiarity with this effect (pale
horizons, for example), enables our eyes to use
it as a clue to depth. Hazy, misty scenes appear
deeper than they really are because of their strong
aerial perspective. It can be enhanced by using
backlighting, as in the example below, and by
not using filters (such as those designed to cut
ultraviolet radiation) that reduce haze. Telephoto
lenses tend to show more aerial perspective than
wide-angle lenses if used on different subjects,
because they show less of nearby things that have
little haze between them and the camera. Favoring
the blue channel when using channel mixing to
convert an RGB digital image to black and white
also accentuates the effect.
Apart from the lightening effect that haze has on
distant things, light tones appear to advance and
dark tones recede. So, a light object against a dark
background will normally stand forward, with a
strong sense of depth. This can be controlled by
placing subjects carefully, or by lighting. Doing
the reverse, as we saw on pages 46-47, creates a
figure-ground ambiguity.
Warm colors tend to advance perceptually and
cool colors recede. Other factors apart, therefore,
a red or orange subject against a green or blue
background will have a sense of depth for purely
optical reasons. Again, appropriate positioning
can be used as a control. The more intense the
colors, the stronger the effect, but if there is a
difference in intensity, it should be in favour of
the foreground.


􀀬ike rhythm, pattern is built on repetition,
but unlike rhythm it is associated with area,
not direction. A pattern does not encourage the
eye to move in a particular way, but rather to
roam across the surface of the picture. It has at
least an element of homogeneity, and, as a result,
something of a static nature.
The prime quality of a pattern is that it covers
an area, thus the photographs that show the
strongest pattern are those in which it extends
right to the edges of the frame. Then, as with
an edge-to-edge rhythm, the phenomenon of
continuation occurs, and the eye assumes that
the pattern extends beyond. The photograph
of the bicycle saddles illustrates this. In other
words, showing any border at all to the pattern
establishes limits; if none can be seen, the image
is take to be a part of a larger area.
At the same time, the larger the number of
elements that can be seen in the picture, the more
there is a sense of pattern than of a group of
individual objects. This operates up to a quantity
at which the individual elements become difficult
to distinguish and so become more of a texture.
In terms of the number of elements, the effective
limits lie between about ten and several hundred,
and a useful exercise when faced with a mass of
similar objects is to start at a distance (or with
a focal length) that takes in the entire group,
making sure that they reach the frame edges,
and then take successive photographs, closing in,
ending with just four or five of the units. Within
this sequence of images there will be one or two
in which the pattern effect is strongest. Pattern,
in other words, also depends on scale.
A pattern seen at a sufficiently large scale
takes on the appearance of texture. Texture is
the primary quality of a surface. The structure
of an object is its form, whereas the structure of
the material from which it is made is its texture.
Like pattern, it is determined by scale. The
texture of a piece of sandstone is the roughness
of the individual compacted grains, a fraction
of a millimeter across. Then think of the same
sandstone as part of a cliff; the cliff face is now
the surface, and the texture is on a much larger
scale, the cracks and ridges of the rock. Finally,
think of a chain of mountains that contains this
cliff face. A satellite picture shows even the largest
mountains as wrinkles on the surface of the earth:
its texture. This kind of repeating scale of texture
is related to fractal geometry.
Texture is a quality of structure rather than
of tone or color, and so appeals principally to
the sense of touch. Even if we cannot physically
reach out and touch it, its appearance works
through this sensory channel. This explains why
texture is revealed through lighting—at a small
scale, only this throws up relief. Specifically, the
direction and quality of the lighting are therefore
important. Relief, and thus texture, appears
strongest when the lighting is oblique, and when
the light is hard rather than soft and diffuse.
These conditions combine to create the sharpest
shadows thrown by each element in the texture,
whether it is the weave in a fabric, the wrinkles
in leather, or the grain in wood. As a rule, the
finer the texture, the more oblique and hard the
lighting it needs to be seen clearly—except that
the smoothest of all surfaces are reflective,
such as polished metal, and texture is replaced
by reflection (see page 124).
Related to pattern and texture, but with
content playing a stronger role, is the idea
of many, as in a crowd of people or a large
shoal of fish. The appeal of huge numbers
of similar things lies often in the surprise of
seeing so many of them in one place and at
one time. The view of the Kaaba in Mecca,
seen from one of the minarets, for example,
is said to take in at least a million people, and
this fact is itself remarkable. Large numbers
congregating usually constitutes an event.
Framing to within the edges of the mass allows
the eye to believe that it continues indefinitely.vvvvvvvvvvvvvvvvvvvvvvv


􀀷hen there are several similar elements in
a scene, their arrangement may, under
special conditions, set up a rhythmic visual
structure. Repetition is a necessary ingredient, but
this alone does not guarantee a sense of rhythm.
There is an obvious musical analogy, and it makes
considerable sense. Like the beat in a piece of
music, the optical beat in a picture can vary from
being completely regular to variations similar to,
for instance, syncopation.
Rhythm in a picture needs time and the
movement of the eye to be appreciated. The
dimensions of the frame, therefore, set some limits,
so that what can be seen is not much more than
a rhythmical phrase. However, the eye and mind
are naturally adept at extending what they see (the
Gestalt Law of Good Continuation), and—in a
photograph such as that of the row of soldiers on
page 183— readily assume the continuation of the
rhythm. In this way, a repeating flow of images is
perceived as being longer than can actually be seen.
Rhythm is a feature of the way the eye scans the
picture as much as of the repetition. It is strongest
when each cycle in the beat encourages the eye
to move (just as in the example to the right). The
natural tendency of the eye to move from side to
side (see pages 12-15) is particularly evident here, as
rhythm needs direction and flow in order to come
alive. The rhythmical movement is therefore usually
up and down, as vertical rhythm is much less easily
perceived. Rhythm produces considerable strength
in an image, as it does in music. It has momentum,
and because of this, a sense of continuation. Once
the eye has recognized the repetition, the viewer
assumes that the repetition will continue beyond
the frame.
Rhythm is also a feature of repetitive
action, and this has real practical significance in
photographing work and similar activity. In the
main picture opposite, of Indian farmers in the
countryside near Madras winnowing rice, the
potential soon became apparent. The first picture
in the sequence is uninteresting but shows the
situation. The individual action was to scoop rice
into the basket and hold it high, tipping it gently
so that the breeze would separate the rice from
the chaff. Each person worked independently,
but inevitably two or more would be in the same
position at the same time. It was then a matter of
waiting for the moment in which three were in
unison, and finding a viewpoint that would align
them so that the rhythm has maximum graphic
effect. These things are never certain—someone
could simply stop work—but the possibility in a
situation like this is high.


we are conditioned to accepting the idea
of a background. In other words, from
our normal visual experience, we assume that
in most scenes that is something that we look at
(the subject), and there is a setting against which
it stands or lies (the background). One stands
forward, the other recedes. One is important, and
the reason for taking a photograph; the other is
just there because something has to occupy the
rest of the frame. As we saw, this is an essential
principle of Gestalt theory.
In most picture situations, that is essentially
true. We select something as the purpose of the
image, and it is more often than not a discrete
object or group of objects. It may be a person,
a still-life, a group of buildings, a part of
something. What is behind the focus of interest
is the background, and in many well-designed
and satisfying images, it complements the subject.
Often, we already know what the subject is
before the photography begins. The main point
of interest has been decided on: a human figure,
perhaps, or a horse, or a car. If it is possible to
control the circumstances of the picture, the
next decision may well be to choose the
background: that is, to decide which of the
locally available settings will show off the
subject to its best advantage. This occurs so
often, as you can see from a casual glance at
most of the pictures in this book, that it
scarcely even merits mention.
There are, however, circumstances when
the photographer can choose which of two
components in a view is to be the figure and
which is to be the ground against which the
figure is seen. This opportunity occurs when
there is some ambiguity in the image, and it helps
to have a minimum of realistic detail. In this
respect, photography is at an initial disadvantage
to illustration, because it is hard to remove the
inherent realism in a photograph. In particular,
the viewer knows that the image is of something
real, and so the eye searches for clues.
Some of the purest examples of ambiguous
figure/ground relationships are in Japanese and
Chinese calligraphy, in which the white spaces
between the brush strokes are just as active
and coherent as the black characters. When
the ambiguity is greatest, an alternation of
perception occurs. At one moment the dark tones
advance, at another they recede. Two interlinked
images fluctuate backwards and forwards. The
preconditions for this are fairly simple. There
should be two tones in the image, and they should
contrast as much as possible. The two areas
should be as equal as possible. Finally, there
should be limited clues in the content of
the picture as to what is in front of what.
The point of importance here is not how to
make illusory photographs, but how to use or
remove ambiguity in the relationship between
subject and background. The two examples
shown here, both silhouettes, use the same
technique as the calligraphy: the real background
is lighter than the real subject, which tends to
make it move forward; the areas are nearly equal;
the shapes are not completely obvious at first
glance. The shapes are, however, recognizable,
even if only after a moment’s study. The figure/
ground ambiguity is used, not as an attempt to
create and abstract illusion, but to add some
optical tension and interest to the images.