Live Interactive PERFORMANCE Engines
Theatrical Systems for Live Performance and its Creative Processes
Simon Fraser University, Media Arts and Sciences
Abstract. Described is a new theory of how systems that support live-performed, interactive and mediated work can be created so that they will sustain cultures of creativity in the formation of artistic work. The theory generates a new model for systems that manipulate image, light, sound, and actuators, through interactivity in live interactive theatrical performance settings.
This thesis presents a theory that describes an ecology for live interactive performance where specific software models, artistic processes, and collaborative techniques are defined in cooperation with traditional performance practices in order to increase the expressive range of artists working with digital media, interactivity, and new computer technology.
Traditional techniques and processes used to create live performances interfere with the smooth collaboration between new technologies and artists. Because no traditions are built up for new types of media and the incorporation of sensing for interactivity into live performance, the time used to create a finished performance work is too long to fit into the creative cycle of experimentation and rehearsal. This causing a technological time lag that slows down or hinders the creative process.
By standardizing the structures through which artist use new technologies in performance, and providing appropriate tools for authoring within live and real-time contexts, a useful paradigm emerges that assists in creativity. The designs of systems that artist use are critical to the artistic process, and the model presented is shown to work within an ecological framework where the culture of performance making meshes with technology to support creative work. The intention achieved does not solve all live mediated performance problems, but does creates a model that has an expressive range outside of current theatrical systems and has built in mechanisms that allow it to expand to incorporate new media genres and interactive techniques as they arise.
In this thesis I will use terms that have many meanings within computer science, media arts, live performance, virtual reality, programming, and interactive media worlds. Because of this, it is necessary to define precisely what they mean in this context.
Live performance is a rehearsed, hard real-time, embodied, and physically manifested event witnessed at the time of presentation by an audience. Performance is enacted by performers for and audience, but may involve virtual and media elements.
Virtual refers to anything that is manipulated or processed in the world of the computer, Physical to anything occurring or embodied in the real world or the world not within the computer.
Theatrical systems are the combined physical and virtual technologies that are coordinated together for use in live performance and include the processes by which those technologies are used.
The word media describes the content presented to the viewer in the context of a live performance. Digital media is any media that can be manipulated, coordinated, and otherwise controlled by a computer. (I use the term media to mean digital media and use these terms interchangeably.) Specifically, digital media is the output of the theatrical system supporting a live performance work. Mediums include light, visuals, audio, atmospherics, and actuators. The specific form of a particular media type can be varied and controlled in many different ways. For instance visuals include video, graphics, animation, slide, etc. Sometimes there is crossover between mediums as when a video is treated as if it were a light. Media is organized into channels that represent a specific content stream.
Digital media is rendered to the physical space through devices that are controlled or manipulated by a control system. Digital media can be modified, layered, and transitioned within the control system. Modification of media refers to the application of effects that change the content of a single channel. Layering refers to the spatial combining together of many channels of media of the same type together at the same time. Transitioning is the temporal changing of one media channel into another, the changing from one effect to another, or the changing from one layering setup to another. Transitioning is the movement of one specific look to another. Mixing is the process of modifying, layering and transitioning media channels.
The word sensing refers to the processes that are used by the computer to determine some aspect of what is happening in the environment of the performance. A sensor is a device that detects a physical phenomenon in the physical world. Specifically, sensing is the input of the theatrical system supporting the work. It is critical to recognize that sensors do not sense, they simply detect phenomenon. Sensing is accomplished by the algorithms that take assumptions made about the environment and couple this with data digitized from sensor to form knowledge.
Interactivity or interactive media is media that responds or reacts through the agency of a computer to human or environmental action.
A media structure consists of all the elements necessary to support the presenting of and interaction with media. This includes sensing, assumptions made about the environment, mixing, rendering, translation mechanisms between sensing and control, devices, and/or compositional engines. A media structure is built from the physical and logical pieces that takes an action in the context of live performance and forms it into a rendering of some digital media response.
Interactive live performance is a live performance that involves the use of media, some or all of which is interactive or reactive to the participants of the performance through the agency of computer technology. Necessary participants are performers and audience, but this does not exclude the participation of artificial entities. (Footnote: There is a gray area here. Should artificially intelligent agency ever be created, these entities might be considered to be sufficient the only performers should they require rehearsal, be capable of improvisation, and be able to make mistakes. Similarly, they could be considered sufficient to be the only audience should they be able to form opinions about what they see, relate their experiences to the performance, and react in a multitude of unpredictable ways to what they see.)
A live interactive performance engine is a specific model of a theatrical system presented in this thesis.
The roots of this work are founded in the creation of a performance space called the Intelligent Stage. This studio was created at the Institute for Studies in the Arts at Arizona State in the years from 1994 to 1999 and is still used as a research platform today. During this period it was an ongoing research project into interactive media issues in the context of dance performance works. The Lab is a performance space that responds to the actions of artists as they move and speak. This allows them to control and interact with electronic theatrical elements such as sound, lighting, graphics, video, slides, and robotics during live performance events. The systems sensing occurs through computer vision systems, motion capture, through speech recognition and many other kinds of sensors. Media responses are accomplished through several controller computers that manipulate the theatrical electronic media. This system and the experience acquired in creating the Intelligent Stage will be drawn upon as part of the methodology for this thesis.
With the use of interactive media in performance, it has become more difficult to create productions quickly and efficiently. While time tested traditions are in place for lighting, sound, and sequencing of events (within a theatrical context), no standard exists for the use of video, graphics, robotics, sensing, or ad-hoc compositional systems (a standard for video is emerging, but still not decided upon). Artists, directors, and designers tend to start from scratch each time something is created and spend as much time on technical aspects as artistic ones. This thesis does not replace creative experimentation with technology; it does however create a base from which creativity can thrive. Discovered are baseline standards and support structures that support a wide variety of already established processes and traditions.
In the areas introduced above, video, graphics, robotics, and sensing, there are three distinct domains that require specific theories: sensing, media, and the connection between these domains: interaction. Media is defined in this context as the outputs of a theatrical system, sensing as the input. Interaction is defined as the process by which an input influences the output process. Output and input refer to information flowing to and from the space. (This means that devices that generate the media such as DVD players are not considered to be sensing devices, where as a button that means “play” on a DVD player is.)
It is important to recognize that there are established theater traditions that are used as guidelines for how to use certain types of media. These traditions take the form of process, knowledge, and commercialized products and are designed to minimize effort and maximize flexibility of the particular media that the tradition addresses. Specifically, lighting and sound are handled in particular ways in theatrical context, down to the modular definitions of equipment purposes, roles people play in the creation, operation of equipment, and remounting of works, types of activities performed, and the mechanisms by which the equipment is controlled. As a working definition in this context, a tradition consists of the methods, processes, sets of equipment, and protocols that are widely accepted as useful implementations or processes.
The issues that need to be addressed in this thesis are twofold. First video, graphics and other peripherally used media need to be formed into general-purpose techniques of operation that can address a wide expressive range of usage. This would allow for experimentation with these media types but still allow for specific implementations that expand the range of the general-purpose solution. This involves identifying a wide array of practices and ways of using these media that are in place right now and finding a technique that supports many of these usages. The second issue that needs to be addressed is the connection between sensing and media domains. What are needed are theories that allow the messy world of environmental sensing to be linked to the precise world of media control. What are also needed are programmatic tools that link the difficult, slow, inertia-laden world of computers, sensing, and media to the fast-moving, improvisational, quick-adapting world of the artist.
Four obstacles exist that prevent the technology being used during the creative process, time, money, technical problems, and the absence of specific technical knowledge. This thesis seeks to make it easier for these obstacles to be overcome through the design of a software support structure. This software must bring together both new ideas of interactivity together with old and reliable traditional methods, along with new programming techniques and methods from current practice.
Some technological barriers are intractable and some are malleable. For instance, with the use of lighting, much of the process involves the physical labor of hanging and focusing lights and cable in preparation for programming. This physical labor is not made easier by the design of new support software, nor is the technology likely to change much over a long period of time. However, the programming of a lighting change can be made easier through the design of good software. Similarly, other types of media require a preparation phase. Recorded video must be shot and edited into the right format and content before it is used. Even live visuals from cameras requires the setup of the cameras and digitizers for it to be used in a live context. However, the programming of and experimentation with prepared or live video can be made easier with the use of good software. In general, the preparation phase of live performance making generally involves "intractable" problems, where as the experimentation phase involves more "malleable" problems.
Traditional techniques and processes create problems that prevent smooth collaboration between new technologies and artists. With new types of media being incorporated into live performance, because no traditions are built up, no tools and processes exist to manage the time required to create something or change and manage things once they have been created. The time used to create something is too long to fit into the creative cycle of experimentation and rehearsal causing a technological time lag that slows down or hinders the creative process.
This creating of new traditions extends to sensing practices as well. The sensing domain is even more in its infancy then any of the media modes. In fact, sensing modalities within a theatrical context are only now becoming recognized and defined. There are currently emerging practices that are in effect in advanced high-end systems for rock concerts, Las Vegas caliber shows, and other events that might be incorporated and adapted to everyday usage.
Currently, there are at least four levels at which people interact with technology that mediates performance. These levels are in roles that people play in the use of technology in the creation of a production. The first level is one of toolmaker. This is a role where a person creates a machine, algorithm, technique or piece of equipment that is used in live performance. A second role is that of programmer in the sense of someone who writes the instructions or scripts for how a particular show is presented. The third role is that of a designer/artist who creates the conceptual formulation of how the performance will work. A fourth role is that of performer. Any one person can play any of the four roles, but in practice, most people are usually only one or two.
Any system that attempts to formulate a theory about systems that support live interactive performance must address issues at all four levels.
The practice of creating live performance has always involved the inhabitation of space by its creators within a design process framework. The key has always been one of inhabiting the work to discover it's meaning, and through the practice of the work, create its form. There is a constant cycle from ideation, to design, to exploration, to rehearsal and practice, and back again and finally into performance. With more and more technologies becoming available for use within the creative process, the expense and complication of those technologies has driven a wedge between the artists and their medium (the stage and its instruments).
Traditionally, artists work with as much of the final product as possible during the creative stage of building a work. Initially, there may be a prop here, a box there to stand in for what is imagined during early formation. As the production matures, more and more elements of the final product are brought to the rehearsal space for discovery and practice. This cycle continues until the finished product is rehearsed and refined where it will be performed.
In a production involving digital media or interaction, it is often difficult to bring the technology to the rehearsal space because of costs associated with transporting and occupying the equipment, essentially taking both time and money. Increasingly, instead of inhabiting a space with the physical and digital sets, costumes, and props, the artists are often forced to imagine the aesthetic results of their actions and interactions with many parts of the theatrical systems. The first integrations are often left until the final hour before the work is shown, often with less than stellar results.
In the recent past a detente was established between technicians and artists. The technologies of the theater of lighting and sound used in a non-interactive paradigm can be used easily through standardized and modularized equipment. (Other types of technologies exist and have been used in the past, but in more specific situations). Because these technologies are relatively easy to integrate into a production it was okay for these to be incorporated in a measured and planned way. Traditions and processes have formed around these technologies that allow there to be maximum impact and integration into a production by skilled and experienced artists.
However, with the explosion of the use of new technologies (such as visuals, complicated lighting devices), the availability of computer control, and the possibilities of interactive media, this paradigm is no longer adequate. Last minute, planned and measured integration by skilled artists can produce adequate results, but those results are usually far short of what can be accomplished. Better results can be accomplished through the incorporation of the digital authoring early within the cycle of discovery in the design process of creating a live performance. The artist must be able to inhabit both the digital space along with the physical space at the time of creation. The technology must be brought to the rehearsal space, or the rehearsal space brought to the technology.
My background puts me in a good position to know how to construct new systems and processes that support the activity of improvising using technology within the creative process. I will bridge years of experience creating live performances (both traditional and unconventional) with my experience and knowledge in computer science to formulate a theory of live interactive performance engines. Because I am trained as a computer scientist and schooled as a performer and performance creator, I am uniquely qualified to understand and solve the issues with which many performance makers are wrestling.
As part of the approach, a prototype system could be constructed that demonstrates the principles described. However, the scope of such a project would be outside the bounds of what can be accomplished by this thesis. The model described here requires too much work for one individual to implement. As part of the theory, a way will be presented to more easily implement the model through the mechanism of an open source effort. (One idea is to take the support structure of the open source projects of Pd and EyesWeb and integrate and modify them to include new elements that improve their overall performance and utility within the context that the thesis is addressing.)
Whatever the approach for constructing a system, it should be removed from the demands of live performance creation during its development. Normally, I would not apply this constraint because it can help to drive the process forward. However, since this is a general support structure for live performance, subjecting it to the demands of one specific project would limit the capabilities of the system.
It is important to realize that the process of creating a multi-media work is collaborative, and that established traditions exist for realizing theatrical works. The purpose here is to suggest techniques, skill sets, and technology that can adapt itself into this environment.
In my experience programming live interactive performances and installations and my observations of other artist's processes, I find myself using similar techniques. Each stage of creating an interactive performance has particular problems that need to be solved. By identifying the conceptual parts of a media structure used in the performance, and techniques used in each part, I believe that several theories can be developed that will enable the creation of a generalized support mechanism. This framework will be used to construct components that are used in individual live works and reused in others.
In general, a way of structuring the software model is to identify building blocks or aspects that can be defined that form a media structure. For live performance engines this will include the following structural aspects:
Environment: Setup and assumptions about the physical space.
Action: The physical phenomenon used to manipulate the media
Digitization: The digitization of the phenomenon from a sensor.
Sensing: The manipulation of digitized data into meaning.
Knowledge: Central repositories of known information about the system and environment.
Translation: The transformation of meaning into control signals for a response.
Media Repository: Prerecorded and live media maintenance.
Control: Coordination of the devices that create or generate the media.
Generation: The device or algorithm that creates the media.
Manipulation: Modification of the media after it has been created.
Transition: Moving from scene to scene in seamless ways.
Devices: Layer for abstractions of physical devices.
Rendering: The device that physically produces the media.
Physical Setup: The wiring and focus of media devices and sensors.
From my experience, these building blocks or aspects can be categorized into three conceptual groupings: a sensing system, a media engine, and the linkage between these two components or coordination elements.
The sensing system is represented by the first several aspects in the list above. It consists of a set of sensors in a particular configuration within a structured physical space and the processing necessary to extract some sort of interpretation of what is really happening in the space observed. This sensed interpretation usually is limited to the particular set of assumptions that are dictated by the structure of the physical space. The output of the sensing environment is the representation of the interpreted meaning in several possible forms such as a list of attribute-value pairs, the current value of a state machine, or some other knowledge representation schema.
An example of a complete sensing environment might be as follows. The physical environment consists of a single camera with a wide-angle lens mounted at least 12 feet above the ground and pointing downward. It is assumed that only one person will be viewed at a time, and that the person is distinguishable from the background (they aren’t wearing the same color as the floor and they are lit in some way). The video images from the camera are processed to determine nine parameters about the person: speed, position, direction, motion, time in the current state, time in the room, present/not present, consistency of actions, repeating/not repeating movement. These parameters are represented by one or two variables each. For instance speed is represented by two variables, one that divides the speed into low, medium, and high, and the other that is a number representing the velocity of the person.
The Media Engine consists of the last of the building blocks in the list above and is a conglomeration of media, compositional engines, and interactions organized and controlled through a state machine, cue list, timeline, or other mechanism. The media is organized in both physical and logical configurations within this engine.
The physical manipulation and rendering of the media by the system will most likely be fixed for particular implementations of a media engine. That is, a particular media engine requires a particular physical configuration of equipment to work because it resides in a particular theater space. For instance, a media engine that plays video may require a serial card, a serial cable, and a DVD player of a particular model and make to work because it was created within that context. However, this will not limit the ability to be able to quickly adapt the system to new or different hardware. As more media technology is virtually manipulated, more standardized approaches to media manipulation will emerge making particular implementations more widely applicable to other environments. In addition, careful separation of device specific or protocol specific aspects in the model will allow designs to be easily adapted to new situations.
Any particular media conforms to a particular domain of implementation represented by a protocol for communication, a method of distribution, a method for mixing, and a method for control. (Any of these categories can be human or machine implementations.) Each medium's domain of implementation is completely independent because of the natural differences in the media types (for example, sound is heard and video is seen), except in the area of method for control. Because of this, the control piece of the media engine is the most important. Any complex manipulation of media requires a state machine of some sort to work. This may simply be a linear cue list, a rule engine, a timeline, or a complex dataflow style patch with interrelated, non-linear, interactions. This manipulation can be directed under human or machine control. Coordination at higher levels is recursive meaning that meta-cue lists can control the actions of domain specific cue lists in order to coordinate the timing of activities and create more expressive interactions.
In addition to commonality of control at a macro level, there is also a kind of micro level glue that is used to coordinate and drive activities of different media domains separately from the cue list mechanism. This is a low-level click track that synchronizes media to a very high-precision machine-view level. SMPTE is the most widely used example of this coordinating lighting, sound and video very precisely and running entire shows accurately down to the microsecond level.
Between the sensing system and a media engine is the desert of multimedia interaction. In a custom made interactive work, this part of the system is tailored and incorporated within both the sensing environment and the media engine. Actually, often all three parts are integrated together within the same program space and not easily pulled apart for reuse. If the sensing environment and media engine are abstracted as two separate entities a need arises to “hook” them together. The linkage between these entities, sensing and media, can be formalized so that components and techniques used can be apparent and reusable.
The input to a media engine from a sensing system consists of different kinds of data, from attribute-value pairs, to numbers of a particular range of values, on/off values, or the state of a sensing environment. The inputs are a defined set of “needs” for the media engine to run and create the interaction desired. For instance, if an engine goes to the frame number of a video sequence, it might require as input a number between 30000 and 30100 representing the first frame to the last frame of the video sequence. The output of the sensing system might be a number between 1 and 3200 creating a conflict that must be resolved between sensing and media.
One of the formalisms that can help with organizing the connection between sensing and media is the use of a blackboard style repository of information generated by the system. The data here can be constructed from sensed information from the environment, maintained from internal states of the media engine, generated from external media devices, or generated algorithmically. By centralizing, or at least, formalizing this information as part of the software interface, the system becomes more maintainable.
Another formalism that is useful in conjunction with a blackboard is the idea of a "translation adapter". Adapters resolve conflicts between data that is generated by the sensing system with input requirements of the media engine. In general, there are many types of translation adapters that can be constructed, some very general, and others very specific to particular problems. Examples of classes of translation adapters are: filters, rescaling of ranges, timing changes, and rate modifications. There are other types of translation adapters that fall into the category of either hybrid or custom operations. These are adapters made for particular solutions to sensing-to-media connections.
The three conceptual groupings of a live performance engine can be mixed and matched with each other so that an artist can take a particular sensing environment and adapt it to a particular media engine. In addition, a sensing environment/media engine pair should be able to be “played” with by modifying thresholds of the sensing environment or the media contained within the media engine.
Both categories, sensing and media, include flows of data; controlling, generative, and mixing processes and hardware; and artistic and technical traditions and techniques.
Within theatrical traditions there exists a base from which innovation occurs within the world of sound and light. In every theater it is expected that it will have a sound source, amplification, mixing, speakers, pipe, lights, electric cable, dimmers, and a lighting console. These are the most basic building blocks within the domains of sound and light from which other devices added to the system must (for the most part) communicate and interact. Similarly, new media and sensing technologies need traditions and standards by which they will be incorporated into the fabric of theatrical practice.
I am contributing to the field of live interactive performance by bringing ideas from the worlds of live performance, electronic music, and computer science to create an improved way of improvising with technology.
Traditional ways of creating live performances where artists create pieces of live performances separately from the technical aspects are no longer supporting the creative process. With the advent of new technologies being used in live performance, it becomes necessary to construct both technical and artistic aspects through interactions within the same time and space.
Current techniques involve an iterative process of preparation and experimentation followed by a period of refinement. Aspects of a performance are created, both technical and artistic, and brought together for integration and experimentation. It is in the integration and experimentation phase of the cycle where current technologies fail to provide enough flexibility to allow artists to quickly figure out what will work creatively.
By designing an ecological framework where software supports the creative process and established traditions, a system is created that enables technology to enter into the creative process in a more intuitive way.
The result of this research will be a theory about a system that will allow artists to play with technology in the creation of live interactive performances that use media and sensing as part of integrated works. This theory will describe a model for a software system that will allow artists to concentrate more on the process of making art, rather than on the details of technological functioning.