Alex François who created the MuSA.RT and MIMI software, and whose software architecture style powered the ESP application, wins the Best Paper Award at the 50th International Computer Conference in Boston, Massachusetts, for his paper, "Resonate: Efficient Low Latency Spectral Analysis of Audio Signal".
Alexandre R.J. François’ research focuses on the modeling and design of interactive (software) systems, as an enabling step towards the understanding of perception and cognition. His interdisciplinary research projects explore interactions within and across music, vision, visualization and video games. He was a 2007-2008 Fellow of the Radcliffe Institute for Advanced Study at Harvard University, where he co-lead a music research cluster on Analytical Listening Through Interactive Visualization.
From 2004 to 2010, François was a Research Assistant Professor of Computer Science in the USC Viterbi School of Engineering at the University of Southern California. In 2010, he was a Visiting Associate Professor of Computer Science at Harvey Mudd College. In 2008-2009, he was a Visiting Assistant Professor in the Department of Computer Science at Tufts University. From 2001 to 2004 he was a Research Associate with the Integrated Media Systems Center and with the Institute for Robotics and Intelligent Systems, both at USC.
François received the Diplôme d’Ingénieur from the Institut National Agronomique Paris-Grignon (France) in 1993, the Diplôme d’Etudes Approfondies (M.S.) from the University Paris IX – Dauphine (France) in 1994, and the M.S. and Ph.D. degrees in Computer Science from USC in 1997 and 2000 respectively.
Resonate: Efficient Low Latency Spectral Analysis of Audio Signal
Abstract:This paper describes Resonate, an original low latency, low memory footprint, and low computational cost algorithm to evaluate perceptually relevant spectral information from audio signals. The fundamental building block is a resonator model that accumulates the signal contribution around its resonant frequency in the time domain, using the Exponentially Weighted Moving Average (EWMA).A compact, iterative formulation of the model affords computing an update at each signal input sample, requiring no buffering and involving only a handful of arithmetic operations. Consistently with on-line perceptual signal analysis, the EWMA gives more weight to recent input values, whereas the contributions of older values decay exponentially. A single parameter governs the dynamics of the system. Banks of such resonators, independently tuned to geometrically spaced resonant frequencies allow to compute an instantaneous, perceptually relevant estimate of the spectral content of an input signal in real-time. Both memory and per-sample computational complexity of such a bank are linear in the number of resonators, and independent of the number of input samples processed, or duration of processed signal. Furthermore, since the resonators are independent, there is no constraint on the tuning of their resonant frequencies or time constants, and all per sample computations can be parallelized across resonators. The cumulative computational cost for a given duration increases linearly with the number of input samples processed. The low latency afforded by Resonate opens the door to real-time music and speech applications that are out of the reach of FFT-based methods. The efficiency of the approach could reduce computational costs and inspire new designs for low-level audio processing layers in machine learning systems.