Panel Title: "We do have ears too, you know!"

Panel Organizer Information

James K. Hahn

Associate Professor
The George Washington University
Department of Electrical Engineering and Computer Science
801 22nd Street NW
Washington, DC 20052

Currently on Sabbatical:
Department of Computer Science
College of Natural Science
Seoul National University
Kwanak-ku Sillim-dong
Seoul 151-742 Korea
Cellular Phone: +82-17-332-5760
Office Phone: +82-2-880-5360, +82-2-880-6757
Home Phone: +82-2-885-8270
Fax: +82-2-871-4912
Email:
hahn@seas.gwu.edu

James Hahn is currently a faculty member at the George Washington University. He is the director of the Institute for Computer Graphics and the Laboratory for Advanced Computer Applications in Medicine where he is leading research in motion control, sound, virtual environments, and image rendering. He has been involved in previous SIGGRAPH tutorials (including organizing two tutorials on sound) as well as authoring SIGGRAPH technical papers. His animations have been shown at the SIGGRAPH Electronic Theatre as well as a number of television programs and museum exhibits around the world. He received an MS in physics from the University of California, Los Angeles and an MS and a Ph.D. in computer and information science from the Ohio State University.

Summary Description of Panel

Sound is an often-ignored topic in an image-centered forum like SIGGRAPH. However, there is a definite relationship between images and sounds that must be understood to present a coherent and compelling sensory experience. In this panel we explore this relationship in a number of domains including cinema, animation, VR, and perceptual psychology and try to answer the question: where does sound belong?

 

Description of Panel

Sounds are an integral part of the environment. They are caused by motions in the world and in turn cause changes to the world. Characteristics of sounds produced are directly linked with the phenomenon that caused the sounds. Sounds are also shaped by the environment in which they propagate. Therefore energies that represent the visible and audible spectrum that permeate the world are very much correlated. In computer graphics (image rendering, computer animation, virtual environments, etc.) the concentration has been in rendering the visible spectrum. When sounds have been added, the correlation between the two has been generally weak. Sounds are usually generated independent from the events that actually caused them. The result is that what we see and what we hear are from two different "worlds." The resultant confusion detracts from the total experience.

Complicating the issue is the fact that sounds tend to be impressionistic. Literal sounds may not give the desired emotional effect. This has been exploited in cinema where sounds composed of layers of many unrelated sounds are used to evoke certain emotions associated with the visuals. Hearing is also usually a background process while seeing is usually a foreground process. In this regard, it has been said that sounds enter our consciousness through the back door whereas images enter through the front door. This introduces all kinds of perceptual complexities, which can work for or against us.

Therefore, proper correlation between images and sounds can only come from consideration of a number of issues, including artistic expression, psychological perception, and technology. In this panel, we will explore this twilight zone between imagery and sounds and how the two should be integrated to give a coherent and compelling sensory experience. The panelists come from a variety of backgrounds. Randy Tom is an Academy Award winning sound designer and mixer who has worked on numerous major motion pictures. Ken Greenebaum has been exploring sound and how it can be integrated into multimedia from a technical point of view. James Hahn and Tapio Takala come from traditional computer graphics backgrounds and have explored sound and how it can be integrated into computer animation and virtual reality. They also have the distinction of having written the only technical paper as well as having the only Electronic Theatre piece in SIGGRAPH dealing primarily with sound. Jim Ballas is a perceptual psychologist who has studied sound from the human point of view.

Even though SIGGRAPH is beginning to realize the importance of sounds, as evidenced by the number of sound-related forums, people who work with sound often feel alienated by the SIGGRAPH community. The focus at SIGGRAPH (as well as other computer graphics, human-computer interaction, and multimedia community) has also been on sound, as if it is another factor that needs to be "added" to visuals. The focus of the proposed panel will be to bridge this gap between sound and images.

 

Why a Panel Session is Appropriate

We have offered two tutorials at previous SIGGRAPHs dealing with sound since 1994. This year, it was felt that the topic has gotten too broad, too multidisciplinary, and perhaps a little too controversial to be covered in a comprehensive way in a tutorial. A panel format will allow a more relaxed presentation of opinions and personal views by the panelists without the pressure to present a structured course on their topic. The panelists come from diverse backgrounds and with different philosophies on how sound should be related to visuals. With such wide backgrounds and viewpoints, a panel will encourage a lively exchange of ideas among the panelists as well as the audience. SIGGRAPH attendees who are interested in sound often feel alienated. The open forum will also allow them to voice their views.

Description of Panel Format

We plan to have a traditional format with the organizer introducing each of the panelists who will present their views on the topic. This will be followed by a free discussion by the panelists and the audience. We will try to be novel in the presentations since we are dealing with a variety of senses and their relationship to each other. Some of the ideas we are exploring:

-We will consider audience participation during presentations (e.g. "Name that sound" to match sounds to recognizable movie scenes)

-We will conduct a brief psychological subject study related with sounds with the audience participating in the study in real-time.

-Part of the panel may be presented with the lights off

-We will consider showing visuals "with and without sound"

-We will consider have a live music performance during the panel perhaps with audience participation (e.g. Tapio Takala had an interactive orchestra installation in last year’s SIGGRAPH Electronic Garden)

 

Description of Panelists

Jim Ballas, Ph.D.

Engineering Psychologist
Code 5513
Naval Research Laboratory
Washington, DC 20375
Office Phone: (202) 404-7988
Fax: (202) 767-3172
Email: jballas@gslink.com

Jim Ballas is an Engineering Psychologist at the Naval Research Laboratory in Washington, DC. He previously held academic appointments at Georgetown University and George Mason University. He has been doing empirical research into the psychology of everyday sound perception for 20 years, investigating topics such as parallels between speech and everyday sound perception, the ecological frequency of different types of everyday sound, the effects of context on the interpretation of ambiguous sounds, and the information that sound naturally delivers during critical events such as airplane accidents. He served on an interdepartmental (Interior and Agriculture) technical advisory panel for the National Park Overflight Research project, which examined the effect of aircraft noise on park visitors. He is currently the President of the International Community for Auditory Display, and a member of the Acoustical Society of America.

Ken Greenebaum

Microsoft Corporation
1 Microsoft Way
Redmond, WA 98052
Office Phone: (425) 703-6873
Email: kgreene@microsoft.com

Ken Greenebaum has spent the past 10 years working on various aspects of the Digital Media field: Audio, Video, and Graphics. First he developed embedded DSP solutions at DVP, an East Coast startup. Next he created applications and system software at Silicon Graphics for many years. At SGI he participated in the creation of many video and audio products including VideoLab, the AL audio library, and InPerson teleconferencing software. Most recently Ken joined the Talisman team at Microsoft and has been responsible for the multimedia aspects of DirectAnimation. DirectAnimation is an animation engine allowing high level description of behavior based, time synchronized media.

 

Tapio Takala, Ph.D.

Professor
Helsinki University of Technology
Department of Computer Science
02150 Espoo, Finland
Office Phone: +358-0-4511 (or 4513222)
Home Phone: +358-0-5023319
Fax: +358-0-4513293
Email: tta@cs.hut.fi

Tapio Takala is professor of multimedia and virtual reality at Helsinki University of Technology. His research interests range from computer-aided design and geometric modeling to music and electronic arts, with a special emphasis on animation and sound processing for virtual actors and environments. He has co-authored a SIGGRAPH paper on sound and contributed to an animation shown at the SIGGRAPH Electronic Theatre. He has also participated in previous SIGGRAPH tutorial on sound. He leads the DIVA digital virtual acoustics research group, which presented an interactive virtual orchestra performance at SIGGRAPH '97 Electric Garden.

Randy Thom

President
Ear Circus
12 Manor Terrace, Mill Valley, CA 94941
Home Phone: (415) 380-8727
Office Phone: (415) 332-9336
Fax: (415) 332-9337
Email: RandyThom@aol.com

Randy Thom has been the President of Ear Circus, a sound art production company, since 1989. He was a Staff Sound Designer/Mixer at Lucasfilm during 1983-1989. He received the Academy Award for "The Right Stuff" and was nominated for Seven Academy Awards for Best Sound for "The Right Stuff," "Never Cry Wolf," "Return Of The Jedi," "Wild At Heart," "Backdraft," and "Forrest Gump." He received nominations for an Emmy for Best Sound in "Ewoks Children's Special." He was nominated for a Grammy for the Best Spoken Word Recording for "War Of The Worlds, 50th Anniversary." He wrote and directed "Dream Train," a Dolby Inc. Promotional Film. He has authored the textbook "Audiocraft," used in over 100 universities. He is a member of the Academy of Motion Picture Arts and Sciences and a member of the British Film Academy.

 

Position Statement From Panelists

Jim Ballas

It is good to talk about integrating sound and graphics at a conference such as SIGGRAPH, but the incorporation of sound will always be lacking unless one is willing to think "sound first," especially in scenarios where sound provides critical information. For example, a recent analysis of NTSB accident dockets provides evidence that sound can be an important cue to the pilots for impending accidents, especially accidents that the pilot hasn’t caused. Often, the events that were conveyed by sound in these accidents couldn’t have been represented graphically. It is generally acknowledged that sound usefully conveys information in everyday life. But in understanding how to integrate sound and graphics, it is also important to know that the sound we hear most often is sound about unseen mechanical events (i.e., fans and engines running), and sound of moving objects and people (i.e., traffic and footsteps). Thus the graphics community should be open to using sound without a correlated graphic, and willing to address the technical problems in tightly integrating moving images and sounds. Finally, integrating sound is a significant sound design task. In this regard, a naive approach (e.g., "let’s add a sound effect here") is a shortcut to failure. Research has shown that no single property of a sound can account for even a simple perception like how long does it take to identify the sound. For this simple perception, the factors involved include the spectral properties of the sound, the causal uncertainty of the type of sound, the context, the typicality of the sound (does it match a mental stereotype?), and its ecological occurrence. So sound rendering should be taken seriously. Examples of how some of these factors affect sound identification will be demonstrated in an interactive part of my presentation.

Ken Greenebaum

I whole-heartedly agree with James' premise for this panel, that in real life auditory and visual phenomena arise from the same physical source and are there by strongly correlated. Unfortunately it is much too difficult to achieve this level of visual and aural integration using current production tools and techniques.

I believe that tools will arise that allow creative folk to setup physical simulations containing many objects of various natures. These objects will then interact based on their inherent behaviors, simulated physics, and creative, directorial input. Sound as well as images will be rendered based on the physical interactions, friction, vibration, resonance, lighting, surface/volumetric phenomena, movement and other attributes of the objects and their environment.

Basically these tools will enable people to easily reproduce the rich interaction that every child so easily enjoys. Build a tall building out of colorful wooden blocks. Roll a big, heavy ball at it. Enjoy the intricate movement and sounds of the collapse! The sounds produced won’t be pre-recorded snapshots but will rather be dynamically synthesized based on the physical interactions in the same way the highlights and shadows of the images are based on the evolving model. Description will play a very powerful roll. Sounds and textures will not be captured, downloaded and re-used, but will rather be parametrically synthesized so as to always be appropriate for the present dynamic.

At SGI I tried for years to promote the benefits of integrating sound into the rich database driven simulation engines of products like Performer and Inventor. When I was recruited by Microsoft's then forming DirectAnimation team at SIGGRAPH'96 I snapped at the opportunity to design the multimedia aspects, sound and video, of this dynamic behavior based media integration engine.

I will discuss the first steps my team and I have taken toward achieving the above goals, and briefly speculate on what may come in the future. Real-time demos will illustrate the behavioral approach to integrated audio and video animation.

Tapio Takala

I have been interested in sound as an integrated part of the visuals. The basic premise is that sound and visuals come from the same simulated world, be it for computer animation or for virtual reality. Therefore, the proper way to approach the creation of the senses is first to model the phenomena. Then, based on this basic model, we render the "scene" using two pipelines: visual and auditory. The way that the two senses are "rendered" are completely parallel and with the same level of importance. In fact, many of the technical problems and solutions associated with the two renderings are the same. We start with an abstract basic model from which we derive the specific auditory or visual models of the scene. The scene is then transformed to either the camera co-ordinate or to the microphone co-ordinate. In the latter case, the transformation is in the time and amplitude domain. The energy associated with sound and light is then traced in the environment. This is then sampled to generate the final image or sound.

As an example of this approach, last SIGGRAPH; we had an installation in the Electronic Garden. The installation was in a room with large-screen walls, where animated virtual players hold different musical instruments, the visitor, wearing data gloves, conduct a musical performance, leading the tempo with one hand and, with the other, directing aspects of the performance (a string crescendo, for example). The players show features of human behavior: they pay attention when the conductor begins, and they continue playing for awhile if the conductor ceases, but they soon return to playing nonsense. Through amplified speakers, the visitors also experience the acoustics of the surrounding virtual concert hall. Alternative acoustic environments (open space, concert hall, church) and pieces in different musical styles can be selected from a menu.

Various techniques are used to produce this fully synthetic experience: rule-based agents for players' behavior, neural networks for response to the conductor, physical instrument modeling for the sound synthesis, real-time reverberation simulation for the hall acoustics, and auralization filters for 3D sound.

Randy Thom

What I propose is that the way for a filmmaker to take advantage of sound is not simply to make sure to get high quality recordings on location, or to hire a Sound Designer to fabricate sounds, but rather to design the film with sound in mind.

It is a commonly accepted myth that the time for film makers to think seriously about sound is at the end of the film making process, when the structure of the movie is already in place. A dramatic film which really works is, in some senses, almost alive; a complex web of elements which are interconnected, almost like living tissues, and which despite their complexity work together to present a more-or-less coherent set of behaviors. It doesn't make any sense to set up a process in which the role of one craft, sound, is simply to react, to follow, to be pre-empted from giving feedback to the system it is a part of.

Why not include composers and sound designers in pre-production discussions about ways to approach storytelling? If your reaction to this is "so, what do you expect, isn’t it a visual medium?" there may be nothing I can say to change your mind. My opinion is that film is definitely not a "visual medium." I think if you closely look at and listen to a dozen or so of the movies you consider to be great, you will realize how important a role sound plays in many if not most of them. It is even a little misleading to say "a role sound plays" because in fact when a scene is really clicking, the visual and aural elements are working together so well that it is nearly impossible to distinguish them. Film maker’s dream of creating those moments.

Telling a film story, like telling any kind of story, is about creating connections between characters, places, objects, experiences, and ideas. You try to invent a world, which is complex, and many layered, like the real world. But unlike most of real life (which tends to be badly written and edited), in a good film a set of themes emerge which embody a clearly identifiable line or arc, which is the story.

It seems to me that one element of writing for movies stands above all others in terms of making the eventual movie as "cinematic" as possible: Establishing point of view. Nearly all of the great sound sequences in movies have a strong element of POV. The audience experiences the action through its identification with characters. The writing needs to lay the groundwork for setting up POV before the actors, cameras, microphones, and editors come into play. Each of these can obviously enhance the element of POV, but the script should contain the blueprint.

Characters need to have the opportunity to listen. Each character in a movie, especially each of the principal characters, is like a filter through which the audience experiences the events of the story. When a character looks at an object, we the audience, are looking at it, more-or-less through his eyes. The way he reacts to seeing the object (or doesn't react) can give us vital information about who he is and how he fits into this situation. The same is true for hearing. If there are no moments in which our character is allowed to hear the world around him, then the audience is deprived of one dimension of HIS life.

Statement of Desire to Participate in Panel Augmentation Programs (Online Panels, CAL, Interactive Presentation)

We are very much open to the idea of taking advantage of any panel augmentation programs available. Specifically:

-We will consider interactive presentation. E.g. "Name that sound" to match sounds to recognizable movie scenes as well as a brief psychological subject study related with sounds with the audience participating in the study in real-time.

- Some of the panelists will install programs that the audience can play with before/after the panel in the CAL.