Robert Rowe

Interactive Music Systems


The main contribution for this thesis is contained in chapter 5 on machine listening.  This chapter is interesting because it outlines a taxonomy of sensing feature agents used with a machine listening program called Cyper.  The domain  over which the system operates is strictly with MIDI data, even so, it provides an interesting list of ways to interpret sensed information.


Notes and Events are the main structure that the system analyzes.  These structures contain information about the pitch, duration, and timing of actions.


Focus and Decay-  This is the ability to change the context of what is evaluated by the system.  The focus can be wide if values are changing over a large area of space and time or narrow if detection of the change requires analyzing a small spectrum of possibilities.


Decay- "the adjustment of focal scale over time"


Interesting concepts:


Automatic thresholds



Manual assignment of thresholds





Register agent:  "classifies the pitch range within which an event is placed". In cyper this is represented by two bits.  Register as it is used here is for judgments like "its a high pitch range" or "its in the middle", or its "a low pitch".


An interesting application of register and focus is described that helps the system make judgments about register.  In my vocabulary I would describe this as an automatic threshold technique.  In Robert Rowe's book he determines the register by keeping track of the maximum and minimum pitch over a period of time.  This determines the range over which register can be judged.  If the range is less than two octaves register is classified into two ranges:  high and low.  If more it is classified into four. 


Robert describes decay as a necessary complement to focus.  Decay adjusts the range over which register is judged by contracting the range slowly over time if new data is not moving the endpoints outward from the min and max values.  This contraction decay's timing and rate is determined empirically.


Dynamics Here, dynamics is the relative loudness of the composition events.  Because of a wide variation in the way instrument  interpret MIDI data, Each instrument must be thresholded differently.  Robert uses MIDI data so he is limited to the information carried by MIDI events.  This is an example of where an automatic thresholding algorithm can't be applied because of ambiguity.  The ambiguity comes in the computer not knowing if the score is being played softly or the instrument has a small range of response.


Density: Vertical and Horizontal.  horizontal is speed and vertical is "the number and spacing of events played simultaneously".


In vertical density focus and decay are ignored due to the low resolution of the measure. 


There are more:


Attack Speed


Chord identification

Beat analysis and tracking: pulse (regular recurrence of undifferentiated events), meter (differentiates regularly recurring events), and rhythm (pattern of strong and weak beats)

Key Identification

Meter Detection


Higher levels:

Phrase finding




The point is that evaluation of even very simple clear reliable data such as MIDI is complicated.  But, even so, can a general method for analyzing sensor data be established that works across data types and meanings?  Is there a cannon of techniques that underlies the analysis of sensed data?