- Book author
- Vesa Välimäki
Abstract: Time-scale modification (TSM) is the task of speeding up or slowing down an audio signal’s playback speed without changing its pitch. In digital music production, TSM has become an indispensable tool, which is nowadays integrated in a wide range of music production software. Music signals are diverse—they comprise harmonic, percussive, and transient components, among others. Because of this wide range of acoustic and musical characteristics, there is no single TSM method that can cope with all kinds of audio signals equally well. Our main objective is to foster a better understanding of the capabilities and limitations of TSM procedures. To this end, we review fundamental TSM methods, discuss typical challenges, and indicate potential solutions that combine different strategies. In particular, we discuss a fusion approach that involves recent techniques for harmonic-percussive separation along with time-domain and frequency-domain TSM procedures.
Preface to “Audio Signal Processing”
This Special Issue gathers 20 fine contributions on audio signal processing, a topic which belongs to the Applied Acoustics section in this journal, Applied Sciences. These articles include revised and extended versions of three papers which won Best Paper Awards at the 2015 International Conference on Digital Audio Effects (DAFX-15) and also new versions of three papers which were presented at the 2015 International Computer Music Conference (ICMC-15).
Submissions were received from many parts of the world, such as Asia, Europe, and North America, and from some of the largest research units in this field, such as CCRMA (Stanford University, CA, USA), IRCAM (Paris, France), the International Audio Laboratories Erlangen (Erlangen, Germany), and the Center for Digital Music (Queen Mary University of London, UK). All manuscripts went through a rigorous peer review. Many fundamental topics in audio signal processing are dealt with in this collection, including active noise control, audio effects processing, automatic mixing, audio content analysis, equalizers, machine listening, music information retrieval, physical modeling of musical instruments, sound reproduction using headphones and loudspeakers, sound synthesis, spectral analysis, and virtual analog modeling.
The first three papers in this collection are review articles, which focus on time-scale modification (by Driedger and Müller), equalization (by Välimäki and Reiss), and feature extraction (by Alías et al.) applied to audio signals. The paper by Pieren et al. discusses auralization of traffic noise produced by a single car. Chun and Kim explain how to improve stereo sound reproduction using frequency-dependent amplitude panning. Stasis et al. present a second paper on audio equalization, concentrating on user-friendly control using high-level descriptors. An earlier version of this work received a Best Paper Award at the DAFX-15 conference.
The contribution by Gutierrez-Parera and Lopez investigates the effect of headphone quality on the perceived spatial sound. Antoñanzas et al. study active noise control in a room using several microphones and loudspeakers. Kim et al. propose methods to actively modify the sonic environment using acoustic feedback. Caetano et al. explore the synthesis of musical instrument sounds using a novel adaptive quasi-harmonic sinusoidal model. Gebhardt et al. won the First Prize at the DAFX-15 conference with their paper on harmonic music mixing using psychoacoustic principles. Desvages and Bilbao also received a Best Paper Award at DAFX-15 with their paper on detailed physical modeling of the violin using a finite difference scheme.
In the second paper on physical modeling, Medine considers two intrinsic problems in acoustic and audio system models: nonlinearities and delay-free loops. Rao et al. propose a new technique for automatic chord recognition from recorded music. Mesaros et al. tackle the evaluation of sound event detection systems. Werner and Abel develop sound effects processing methods, which are inspired by the Hammond organ. Thoret et al. suggest a control strategy for friction sound synthesis capable of producing noises similar to a creaky door or a singing glass, for example. Xu et al. write about a machine listening technique, which can be used for acoustic monitoring in a coal mine. Falaize and Hélie discuss the port-Hamiltonian approach to physical modeling of analog audio circuits, such as the diode clipper and the wah-wah pedal. In the last paper, Werner and Germain discover a way to improve the estimation of the frequency and amplitude of a sinusoid with the help of power scaling of the spectral data.
I am grateful to all contributors who made this Special Issue a success. The organizers of DAFX- 15 and ICMC-15 conferences, especially Prof. Sigurd Saue (General Chair, DAFX-15) and Prof. Richard Dudas (Paper Chair, ICMC-15), deserve special thanks for their collaboration, which made it possible to get some excellent works presented in those conferences published in this Special Issue. I also want to thank all reviewers; although they were busy with many tasks, they tirelessly helped to improve these manuscripts. Finally, thanks go to Jennifer Li and the rest of the Applied Sciences editorial team for their effective and friendly collaboration. I hope that this collection will serve as an inspiration for future research in audio signal processing.
Vesa Välimäki
Guest Editor
Audio signal processing is a highly active research field where digital signal processing theory meets human sound perception and real-time programming requirements. It has a wide range of applications in computers, gaming, and music technology, to name a few of the largest areas. Successful applications include, for example, perceptual audio coding, digital music synthesizers, and music recognition software. The fact that music is now often listened to using headphones from a mobile device leads to new problems related to background noise control and signal enhancement. Developments in processor technology, such as parallel computing, are changing the way signal-processing algorithms are designed for audio.
Topics covered, but were not limited to, the following areas:
- Audio signal analysis
- Music information retrieval
- Enhancement and restoration of audio
- Audio equalization and filtering
- Audio effects processing
- Sound synthesis and modeling
- Audio coding
- Sound capture and noise control
- Sound source separation
- Room acoustics and spatial audio
- Signal processing for headphones and loudspeakers
- High-performance computing in audio
1. Introduction
Time-scale modification (TSM) procedures are digital signal processing methods for stretching or compressing the duration of a given audio signal. Ideally, the time-scale modified signal should sound as if the original signal’s content was performed at a different tempo while preserving properties like pitch and timbre. TSM procedures are applied in a wide range of scenarios. For example, they simplify the process of creating music remixes. Music producers or DJs apply TSM to adjust the durations of music recordings, enabling synchronous playback [1,2]. Nowadays TSM is built into music production software as well as hardware devices. A second application scenario is adjusting an audio stream’s duration to that of a given video clip. For example, when generating a slow motion video, it is often desirable to also slow down the tempo of the associated audio stream. Here, TSM can be used to synchronize the audio material with the video’s visual content [3].
A main challenge for TSM procedures is that music signals are complex sound mixtures, consisting of a wide range of different sounds. As an example, imagine a music recording consisting of a violin playing together with castanets. When modifying this music signal with a TSM procedure, both the harmonic sound of the violin as well as the percussive sound of the castanets should be preserved in the output signal. To keep the violin’s sound intact, it is essential to maintain its pitch as well as its timbre. On the other hand, the clicking sound of the castanets does not have a pitch—it is much more important to maintain the crisp sound of the single clicks, as well as their exact relative time positions, in order to preserve the original rhythm. Retaining these contrasting characteristics usually equires conceptually different TSM approaches. For example, classical TSM procedures based on waveform similarity overlap-add (WSOLA) [4] or on the phase vocoder (PV-TSM) [5–7] are capable of preserving the perceptual quality of harmonic signals to a high degree, but introduce noticeable artifacts when modifying percussive signals. However, it is possible to substantially reduce artifacts by combining different TSM approaches. For example, in [8], a given audio signal is first separated into a harmonic and a percussive component. Afterwards, each component is processed with a different TSM procedure that preserves its respective characteristics. The final output signal is then obtained by superimposing the two intermediate output signals.
Our goals in this article are two-fold. First, we aim to foster an understanding of fundamental challenges and algorithmic approaches in the field of TSM by reviewing well-known TSM methods and discussing their respective advantages and drawbacks in detail. Second, having identified the core issues of these classical procedures, we show—through an example—how to improve on them by combining different algorithmic ideas. We begin the article by introducing a fundamental TSM strategy as used in many TSM procedures (Section 2) and discussing a simple TSM approach based on overlap-add (Section 3). Afterwards, we review two conceptually different TSM methods: the time-domain WSOLA (Section 4) as well as the frequency-domain PV-TSM (Section 5). We then review the state-of-the-art TSM procedure from [8] that improves on the quality of both WSOLA as well as PV-TSM by incorporating harmonic-percussive separation (Section 6). Finally, we point out different application scenarios for TSM (such as music synchronization and pitch-shifting), as well as various freely available TSM implementations (Section 7).