DirectAnimation orthogonal media engine highlighting audio integration

DIRECTANIMATION -A NEW APPROACH TO MULTIMEDIA

Ken Greenebaum

1 Microsoft Way

Redmond, WA 98052

425-703-6873

ABSTRACT

Traditionally, libraries that expose computer media capabilities to programmers are both monolithic (they control only one media type), and imperative (the programmer commands the computer to perform discrete steps). These libraries control one medium very well but are difficult to integrate. For example, Silicon Graphics’s has the GL graphics library, the AL audio library, the CL compression library, and the VL1video library. DirectAnimation (referred to in this paper as DA) takes a very different approach. Inspired by the TBAG2 system, DA is a multimedia engine and API that fully integrates 2D, 3D, video and audio media with an implicit timeline via dynamic constructs called behaviors. Programmers describe animation using a declarative functional language paradigm. DA allows content to be authored either for standalone applications or the Web.

INTRODUCTION

This paper describes the fundamental DA concept of behaviors, the motivation for the declarative, functional-based programming approach, media integration, and the concept of implicit time/media synchronization. It emphasizes the treatment of audio and demonstrates how easily audio can interact with other media elements and how simply advanced techniques such as parametric synthesis, 3D spatialization, and synchronization are available to the developer.

Behaviors

Behaviors are fundamental to DA. A behavior is simply a described value that is defined with respect to time or other behaviors. All behaviors belong to one of DA’s fundamental types (described later). Each fundamental behavior type has a number of dynamic parameters. These parameters have default values but can be defined with respect to time or other behaviors. Audio parameters include rate, phase, gain and pan.

Some primitive behaviors are system constants (for example, silence, emptyImage, and red) but most are the result of declaration, a synthesis operation, or media import. In turn, most behaviors are actually composites of other behaviors. Behaviors dependent on outside events, such as a user clicking the mouse, are called reactive behaviors. Here are some examples of audio behaviors (all examples require Microsoft Internet Explorer 4.x):

Primitive, static behavior:

awSound = sinSynth.rate(800); // 800Hz sine wave

Listen to this example.

Dynamic behavior based on rawsound (rawSound rate varying from ½ to 1½ nominal every second):

siren = rawsound.rate(div(add(toBvr(2),sin(localTime)),toBvr(2)));

Listen to this example.

Reactive behavior with mouse button interaction:

alarm =(SoundBvr)until(silence, leftButtonDown, siren);

Listen to this example.

Declarative approach

While most media libraries are designed using the imperative paradigm, DA uses a functional approach. The advantages of functional programming are well-presented in John Bakus’ Turing Award Lecture3 and its fundamental principles are described in introductory texts4.

DA’s declarative approach makes it very straightforward for the programmer to build up complex relationships and animations without worrying about side effects or synchronization. Compound behaviors such as a spider climbing up and down a web hanging from a swinging pendulum are easy to describe but difficult to simulate in imperative systems.

Because behaviors are not side-effecting and because the DA engine has the complete description of the universe, it can provide a broad range of optimizations that the end user never has to consider. This makes DA animations potentially more efficient than programs written directly to the foundation libraries. For example, the engine currently optimizes-out static behaviors and performs dirty rectangle optimizations,. Other techniques that are only possible using a declarative approach are being investigated. If the functional programming paradigm were not enforced and behaviors were allowed to side effect, then it would not be possible for the engine to ‘skip’ any work.

Media Integration

DA offers rich multimedia capabilities. It exposes sound, image, and 3D geometry behavior types as well as a host of related behaviors including number, string, color, point and vector, transform, camera, microphone, font, line and spline. Movies are exposed as time-varying 2D images. Image behaviors can be used as animated textures.

Each behavior type is actually abstract, and represents a variety of related media. In the current implementation, the Sound behavior actually includes sampled PCM sound, parametrically synthesized sound, and MIDI sound. Each of these sounds is accessed in the same way as the others. They all have the same parameters (gain, rate5, and phase6), and can be used interchangeably.

Sounds can be inserted into a 3D geometry to form sound geometries. These then get transformed into 3D space in the same manner as any other geometry. Attaching a sound to a geometry is somewhat akin to providing geometries with textures or material properties. However, because DA embeds sound behaviors the sounds retain their dynamic properties and continue to function.

3D geometries rendered against 3D cameras yield image behaviors. Geometries rendered against 3D-positioned microphones yield spatial audio behaviors. These rendered behaviors can further be used to develop more complicated behaviors. For example, it is possible to texture map the image behavior from a three-dimensional rendering of a geometry onto the screen of a television in an animation, and inserting the spatialized sound behavior into the television’s speaker geometry. This is basically embedding one 3D world into another!

Implicit Time Synchronization

Synchronization between media types can be a complicated and error-prone exercise when using conventional media libraries. In DA, time is implicit. The programmer describes the relationships between the media behaviors and the engine assures synchronization. A simple example is an animation of a bouncing ball. In DA, a programmer could define a 3D point behavior describing the position of the ball over time. This behavior can be as complicated as the programmer desires. It might include gravity, a model of material hardness properties, and even interaction. The programmer would then create an image behavior based on position to display the ball and its environment and a sound behavior. The sound behavior, also based on position, would detect collisions and play an appropriate synthesized collision sound based on, for example, the speed and angle of the interaction. The resulting animation achieves synchronization between the image of the ball and appropriate collision sounds caused by the ball interacting with the surface without the programmer specifically relating the image and sound rendering.

James Hahn used this paradigm quite effectively in his TimbreTree work. The animation "Blowing in the Wind" animated a model of chimes excited by simulated wind. The rendered images and generated sounds were all controlled by the physical model of the wind causing the chimes to collide. DA allows this powerful approach to be used in real-time. This is illustrated in several samples, where a DA simulation behavior controls different aspects of the media types used in the animation, even allowing for user input.

IMPLEMENTATION

DA is built on top of existing media resources and, as much as possible, leverages native platform capabilities. In this way, DA benefits from the constant advancement in the underlying technologies.

Language Integration

The DA API is exposed via COM (Microsoft’s component object model). This allows DA to be easily accessed from many popular high-level languages. DA is currently being used from Java, JScript, C, C++, VBScript, and Visual Basic. Class libraries are provided to wrap the functionality and allow a native integration into Java, hiding COM. Similar libraries are anticipated for C++.

Very simplified access to the DA engine will be provided directly from HTML via the DirectAnimation End User Controls. In addition, VBScript, JScript, or Java embedded in HTML web pages expose the full functionality of the DA API.

Implementation

DA is itself implemented in C++, utilizing functional, non-side effecting techniques within the engine. Media capabilities are mainly provided by the DirectX foundation libraries: DirectSound, DirectDraw, and Direct3D, with higher level filter graph, audio and video format decoding provided by DirectShow.

AUDIO CAPABILITIES

Unlike other animation systems, audio is not incorporated as an afterthought in DA but is a full-fledged element of the system. This section will describe some of DA’s audio capabilities.

Audio Sample Importation

Audio importation is accomplished via the DirectShow streaming architecture. This extensible mechanism allows DA to import PCM audio files in any format for which a DirectShow CODEC is registered. DirectShow’s extensibility mechanism allows DA to automatically take advantage of any CODECs the programmer has added to the system. Because DA imports URLs (WWW Universal Resource Locator), audio can easily be streamed over the net using low bandwidth CODECs and protocols such as Progressive Network’s RealAudio.

Audio Mixing

DA can mix audio of any supported type, including synthesized or sampled PCM, and MIDI, automatically performing any necessary format conversions. In addition, a sound, the same as any other behavior, can be multiply instanced. Each sound instantiation has independent dynamic parameters. One sound can be imported and then used simultaneously with different parameters. Recursively mixing sound is a commonly used DA semantic. Consider this example that adds a voice each time the mouse button is clicked:

SoundBvr chorus = SoundBvr.newUninitBvr();

chorus.init((SoundBvr)until(silence,leftButtonDown,mix(voice,recsnd)));

Listen to this example.

Spatial Audio

As mentioned earlier, DA exposes 3D spatial audio in a very straightforward manner. By embedding sounds in geometry, sounds can be transformed into space and then ‘perceived’ by 3D microphones. The microphones too are behaviors and can be moved and oriented in 3D space.

Listen to an example.

Low Bandwidth Web Animation

Many of DA’s audio features make it well suited for low bit-rate web-based applications, delivering high quality animation and sound to computers connected by slow modems. Data can be compactly described and then shipped to the client to synthesize.

Low bit-rate network streaming solutions such as Voxware’s 2400bps speech CODEC allow speech of indefinite length to be easily incorporated into an animation and played instantly without waiting for the data to be first download.

The use of MIDI, a highly compact music representation, allows animations to provide background music or extremely lightweight sound effect cues. MIDI can be spatialized and parametrically manipulated the same as any other sound.

Parametric Synthesis

Parametric synthesis is a very powerful technique. It allows complex sounds to be described and synthesized on the host instead of transmitting the sampled sound over slow network connections. The sinSynth primitive behavior is our proof of concept. We envision 1/f noise, collision, wind turbulence synthesizers as well as parametrically-controlled signal processing in future versions of DA.

DA provides a rich environment for prototyping and developing complex behaviors such as parametric synthesizers. In some of our example content, we have experimented with dynamic layered ambient sounds built on top of DA primitives. Seashore, underwater diving, and city ambient sound synthesizers have been created.

Traditional animations contain pre-authored looped ambiance that quickly become repetitive. The resulting sound is often highly compressed to a very low quality to combat the excessive size caused by the length needed to overcome this repetitiveness. Dynamic layering builds ambiances from a small number of tiny but high-quality sampled sounds. These samples are not pre-processed but rather are used ‘dry’ and normalized. The ambient behavior engine then dynamically transforms and mixes these components together with randomizing factors. The result is a dynamically changing sound that never loops and can be synchronized with outside events such as user interaction.

Audio In Visual Simulation

Finally, DA provides a great environment either for importing data or simulating systems and then displaying the results as dynamic text, color, 2D, 3D animation and sound. These simulations can be interactively manipulated. The script used to generate the display is simple to customize because it is composed of a hierarchy of behaviors. The resulting animation can be embedded in a web page to share with others.

RESULTS

DA’s initial goal is to enable a new class of authors to develop media-rich, animated, web content and illustrations. Providing access to DA from a wide variety of programming language environments has made the technology more accessible to content developers and has greatly enhanced its acceptance. In addition, DA is unique in the breadth of capabilities it brings to the Java environment.

Early hands-on sessions have demonstrated that programmers take quickly to DA concepts and quickly begin creating animations. These same programmers would find it very difficult to create comparable animation using existing media libraries without considerably more training.

CONCLUSION

DA is an exciting new system for the creation of dynamic multimedia animation. The DA engine is accessible from many of the popular programming language environments and can easily be integrated into World Wide Web content. The engine, scheduled to ship with Microsoft’s Internet Explorer 4.0 release, will enjoy a very wide distribution. The DA SDK (software development kit) is available free of charge and is in the process of being ported to additional platforms. Authoring tools for DA are currently in development.

REFERENCES

IRIS Media Libraries Programming Guide. Silicon Graphics.

Elliot, C., Schecter, G., Yeung, R., Abi-Ezzi, S. TBAG: A High Level Framework for Interactive, Animated 3D Graphics Applications. ACM SIGGRAPH ’94. 421-434.

Backus, J. Can programming be liberated from the von Neumann style? A functional style and its algebra of programs. Communications of the ACM. 21, 613-641.

Ullman, J.D. Elements of ML Programming.

Rate on PCM sounds change the nominal playback sample rate changing the pitch.

Phase is defined in terms of time and affects the position within a sound. A looping sound phased –0.1 seconds will start playing 0.1 seconds from the end of the sound.

ACKNOWLEDGEMENTS

The author wishes to thank David Thiel for his contributions to my understanding of dynamic audio, (specifically layered sound), and the entire DirectAnimation team for making this product a reality and introducing me to many of these concepts.