A Blueprint for Using a Reactive Performance Space
Robb Lovell
Technical University of British Columbia
robblovell@nexus.techbc.ca
A reactive performance space is a theatrical environment that enables physical actions to effect and manipulate electronic media. These spaces allow performers to improvise with media through a variety of means.
Electronic media consists of any media that can be controlled from a computer. These are generally divided into four categories: visuals, light, sound, and mechanical systems. Physical actions within the space consist of anything that can be sensed and interpreted by a computer. This consists of things like video based sensing, tracking systems, sound sampling, pitch detection, or analog sensors (heat, touch, bend, acceleration, etc).
Like a traditional theater, the systems that provide the ability to manipulate media (such as the lighting, visuals, sound, or kinetic mechanical objects), are resident in the theater as part of the infrastructure. This is to say, that the equipment used for the creation of interactive effects, sensors and computers, are dedicated to the space, not brought into it to accomplish particular effects.
In a reactive performance space, it is important that all the theater media be controllable from the computer systems through algorithmic processes, and that the computers have the ability to sense the environment in some way. These two attributes when connected together provide the infrastructure of a reactive performance space.
Examples of reactive spaces are the Intelligent Stage (Arizona State University, The Institute for Studies in the Arts, Tempe Arizona), and åR-Space (University of Århus, Center for Advanced Visualization and Interaction, Århus Denmark). ASU's Intelligent Stage is setup with many media and sensing capabilities including lighting, digital and analog video (16 channels, 3-5 Laser Discs, 5 mixers), composition, DAT, CD, and Tape playback, digital signal processing, video based motion sensing, sound sensing, infra red sensing, and many other kinds of analog sensors. The åR-Space is equipped with mini-disc and CD playback, analog video (8 channels, 2 DVDs, 4 mixers), lighting, speech recognition, sound sensing, digital signal processing, video base motion sensing, and several types of analog sensors.
Case Study, Story in a Line.
Many of the techniques for mediated performance are established, but not widely known, and many are still being formed in the theory behind interactive systems. The least known area is in the extraction of meaning from sensed data, and the translation of that sensed data into control signals. This is the "guts" of the process, and often requires as much time to put together as the media itself, especially if it is compositional in nature.
An example of the process required can be shown in a case study of a simple environment created for the åR-Space called "Story in a Line". The goal of this environment is to allow the performer to advance a non-linear story through visitations to various locations on stage, and to provide the ability to bring in an accompanying sound landscape through movements within the stage space.
Media prepared for the environment consists of 10 segments of a story stored in audio files, and various environmental short whisper sounds of words and phrases related to the story. The story, written by Torunn Kjølner (Department of Dramaturgy, Århus University), is of a dream sequence of an accident as viewed from a kind of angelic perspective. The story can be advanced in a loose non-linear fashion but is experienced best linearly. The media is stored on disk and played back through the use of MAX and MSP[1] .
Sensing is accomplished through a video camera which has an overhead view of the stage space. The physical action that is detected is simply the change in light at particular locations on stage. Ten locations in the video are processed for an object that is light against a dark background. The assumptions made about the environment is that the people in the environment are light in color, and that when a person is in a particular location, they wish to play a particular piece of the story from beginning to end (the story will repeat if the person leaves that space and returns after the story is over, and in fact, no other stories can be triggered while the story is playing). It is also assumed that the camera is overhead and that the image plane roughly corresponds to locations on stage.
With this assumption, the meaning extracted from the space is the location of the performer on stage. To do this, an image is obtained from the overhead camera through a digitizing board. This data is then processed as an image. First the background is subtracted from the incoming image to allow the background to be removed so that the people are more easily seen. Once this is done, the ten areas are processed with a threshold function where the image is divided in the intensity range between light and dark. At this point the computer can know when a person is in a particular location or not. This information is represented as a series of ten numbers where if the value is below 2 there is no one there, and if it is above 2 someone is in the location viewed by the video sensor.
From here, what remains is a translation of the ten numbers into activation of the stories. Several things need to be accomplished. First, any noise from the uncertain environment of the stage needs to be removed. Second, if more than one area is active, a decision needs to be made about the one that takes priority. Third, if a story is playing all the trigger areas need to be suppressed. Fourth, the number representing the location that is active needs to be translated into a selector for the particular story to be played for that location. This value is sent to a controller process that activates the story.
The controller is resident in a MAX patch that selects audio files based on a number 1 through 10 sent to it. The generation occurs within MSP which plays the audio file off of disk. Modification of the volume of the sound occurs within MSP through a DSP object within MSP, but is also modified through the use of an OD3 Yamaha mixer. Rendering is through the usual amplifiers and speakers.
A second media structure exists within this environment. This second structure, forks off of the first after the sensing step. The goal of this second structure is to activate various overlaying sounds which are played as a background to the main story. These sounds can be layered, and do not have to wait for completion of other sounds to trigger. Ten different regions are processed to determine if those areas on stage are active, as before after this step, the computer knows if something is happening in the space on stage corresponding to the sensor's view (in the camera's view) of the stage.
The translation step consists of filtering noise as before, but no prioritizing of sounds is required since samples can interrupt each other. Since the sounds are short, it is desirable to have the sound triggered only on the activation of the sensor, and not on the state of the sensor and processing includes a test to see if the state of the sensor has changed. The final translation is from the sensor number to the sound sample number to be sent to the controller.
From this point the remaining infrastructure merges with the first chain of events from control to rendering.
Media Structures
There are a set of capabilities, dependencies and interactions between various media in a reactive performance space. While one can define the components of a reactive performance space, it is not clear how to define the capabilities and limitations of any given media, and more precisely, how to define the building blocks that can produce a specific interactive environment or effect. Each type of media is capable of a range of interactive modes depending on the equipment used to generate and render the media. Each media interacts and depends upon other media in various ways as well. Each type of sensor used also has limitations and capabilities and dependencies on the structure of the environment. Each type of sensor will have different affinities with various types of media.
Figure 1. The elements of a media structure. A media structure is a series of linked steps that are used to create a mediated environment.
To start addressing these capabilities, dependencies, and interactions, it is useful to think of the structure of separate media channels used. Not as a particular effect in a particular media, but as a type of media and its connection to that which creates and manipulates it. Each type of media used, whether video, or slides, or CD sound, consists of a linked set of steps that can be called a media structure [Figure 1.] These steps occur on and within pieces of hardware that make up a media structure.
A media structure is built from the physical and logical pieces that takes an action and forms it into a rendering of some electronic media. Generally these steps are divided into three broad categories: Sensing, Processing, and Response[2]. Yet it is useful to further understand the inner workings of a reactive environment to subdivide the categorization into more elements:
Action -> Sensing -> Processing -> Translation ->
Control -> Generation -> Manipulation -> Rendering
It is useful to think about the equipment for producing a media structure as having various attributes that are either fixed in a standard configuration or are malleable in some way. For instance for lighting media in a theatrical context that uses dimmers, the dimmers are usually a fixed entity, but the positioning of the lights within the space is a malleable one.
In a non-reactive context the media structure for controlling a particular type of media consists of between two and four steps, control, generation, manipulation, and rendering. For instance, if the media is sound, these four parts could consists of a person pressing play on a tape deck which generates some sound that is sent to a sound mixer and then amplified into some speakers. The control is the person, the generation is the tape deck, the manipulation is through the sound mixer, and the rendering is accomplished through the amplification through speakers. In this media structure the fixed pieces here are the tape deck and the amplifiers and speakers, and the malleable parts are the positioning of the speakers, and the content of the tape.
More specifically, control is an entity that initiates events that direct the generation of media. Control is generally responsible for starting and stopping things, and directing the flow of when things happen. Generation is an entity that creates the media. This can be an algorithmic process inside a computer or a piece of hardware like a DVD player. Manipulation is a step that modifies some parameter of the created media. Control, generation, and manipulation are loose terms and, in some contexts, one or more of them can be thought of as one entity. Rendering is the entity or technique that displays the media. An example of this is a video projector and screen, or amplifiers and speakers.
It is important to note that these steps aren't necessarily fixed in order. For instance, in an algorithmic process, the generation and manipulation steps could be repeated several times before the results are actually rendered.
In a reactive context, the media structure chain can be thought of to contain at least four more steps, action, sensing, processing, and translation. By way of example consider an interactive environment that involves a heat sensor that chooses between selections on a CD. The physical action occurring is the heat of something. The sensing is accomplished by a heat sensor and some digitizing device. Digitized raw data requires processing to remove noise, smooth changes, or extract some sort of extra information, perhaps things like direction and rate of travel of the temperature, highs, lows, and/or if it hits a particular temperature. The extracted information that results from processing the raw sensor data then needs to be translated into signals that can be used by the controllers.
Again, defining the terms. Action is the physical phenomenon that will control or effect the media. Examples of this are the movement of an arm, breathing, or heat. Here, that what is detected is something within the environment and this can be the actions of a human body or some other physical system. Sensing is the act of transforming the physical phenomenon into a digital representation. Generally, sensing involves reading data from a sensor of some sort that measures some aspect of the physical world. This aspect, may be only a manifestation or parameter of the phenomenon, and not the phenomenon itself that is needed to be perceived. For instance, a video camera as a sensor does not register movement, it only registers reflected light. Processing is the algorithmic process of extracting meaning from the sensed data. In the case of a video camera that detects light, movement can be inferred by the changes in the image plane because light changes on objects when they move. Again, processing may not unambiguously extract meaning. In the case of a video camera, and extraction of movement, it is also true that when the light changes, things may not be moving. Translation is the process of taking data structures of represented meaning and producing signals that can be used by steps down the line such as a controller to generate or modify media.
Within the media structure, the last part, or the media generation part is the most easily understood, and the sensing part is not as intuitively understood. The reason for this is that computers do not perceive reality in the same way that a human body does. In fact, a computer perceives very little information from sensed data, and relies heavily on assumed facts about the environment from which the sensed data is extracted. The human body's sensing system is structurally coupled to its neurophysiological processing, and motor response systems, and it is only when the assumptions break down, as with optical illusions, that this becomes apparent[3].
This media structure is certainly not a fixed linear progression from action to rendering. This would be too simplified a view to be useful in all situations. Many times it is useful to combine information from several sensors to deduce the physical reality of an action or state within the real world. In this, case two streams from action to sensing to processing merge together into the translation step. Loops are also possible where processed sensed data is translated into another representation that is then combined back into a processing step for another sensor.
In this way these categories are building blocks that can guide the creation of general purpose algorithms, physical systems, and techniques for constructing interactive environments. This idea is central to the structure of the Intelligent Stage. In a traditional theater, the techniques for accomplishing particular effects are known and used over and over. Likewise, the techniques for creating more complex theatrical environments should be similar, known and reused. A difference exists, however. Many of these reusable techniques and systems are internalized as algorithms rather than physical equipment, cue lists, or traditions.
Summary
The purpose of this paper is to outline some steps that can be used to build interactive media environments or effects. The term media structure is introduced to describe various instantiations of these steps into a coordinated process. The media structure takes sensed data and processes it into a mediated response. Eight building blocks are suggested that can be used to think about the form of the media structure, these include:
Action: The physical phenomenon used to manipulate the media
Sensing: The digitization of the phenomenon from a sensor.
Processing: The manipulation of digitized data into meaning.
Translation: The transformation of meaning into control signals for a response.
Control: Coordination of the devices that create or generate the media.
Generation: The device or algorithm that creates the media.
Manipulation: Modification of the media after it has been created.
Rendering: The device that physically produces the media.
Future Directions
Future directions of this research will focus on standard mechanisms various steps of the media structure, both algorithmic and physical supports will be identified. Work will concentrate on the process of extracting meaning from sensed data, and new mechanisms for rendering media that support interactive environments.
[1] MAX is a graphical programming environment made originally made for musicians, and later adapted for use with mediated spaces. MSP is a audio digital signal processing system that runs within MAX. Both are products of Cycling 74 and Ircam.
[2] Robert Rowe, Interactive Music Systems, The MIT Press, Cambridge, Massachusetts,
London, England 1994, pp. 9-32
[3] William R. Uttal, On Seeing Forms, Lawrence Erlbaum Associates, Hillsdale, New
Jersey, 1988, pp 146-161, pp 25-48