WWW.ABSTRACT.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Abstract, dissertation, book
 
<< HOME
CONTACTS



Pages:   || 2 | 3 | 4 | 5 |   ...   | 6 |

«Supervisors Ing. Radim Špetík Czech Technical University in Prague Faculty of Electrical Engineering Department of Circuit Theory tel: +420 2 2435 ...»

-- [ Page 1 ] --

Automatic Transcription of Audio Signals

Master of Science Thesis

May 2004

Student

Jiří Vass

Czech Technical University in Prague

Faculty of Electrical Engineering

Department of Measurement

tel: +420 737 111030

e-mail: vassj@fel.cvut.cz

web: http://measure.feld.cvut.cz

Supervisors

Ing. Radim Špetík

Czech Technical University in Prague

Faculty of Electrical Engineering

Department of Circuit Theory

tel: +420 2 2435 2049

e-mail: radim.spetik@email.cz

web: http://amber.feld.cvut.cz Hadas Ofir Technion - Israel Institute of Technology Department of Electrical Engineering Signal and Image Processing Laboratory e-mail: hadaso@siglab.technion.ac.il web: http://www-sipl.technion.ac.il i Abstract This thesis is concerned with automatic transcription of monophonic audio signals into the MIDI representation. The transcription system incorporates two separate algorithms in order to extract the necessary musical information from the audio signal. The detection of the fundamental frequency is based on a pattern recognition method applied on the constant Q spectral transform.

The onset detection is achieved by a sequential algorithm based on computing a statistical distance measure between two autoregressive models. The results of both algorithms are combined by heuristic rules eliminating the transcription errors. Finally, new criteria for evaluation are proposed and applied on transcription results of several musical recordings.

Keywords: music transcription, pitch detection, fundamental frequency tracking, onset detection, monophonic audio Abstrakt Tato diplomová práce se zabývá automatickou transkripcí jednohlasých hudebních signálů do formátu MIDI. Transkripční systém zahrnuje dva samostatné algoritmy nezbytné pro získání hudební informace z audio signálu.

Detekce základní harmonické složky je založena na metodě hledání vzorů ve spektrální transformaci s konstantním činitelem jakosti Q. Detekce začátků not je dosaženo pomocí sekvenčního algoritmu založeného na výpočtu statistické metriky mezi dvěma autoregresními modely. Výsledky obou algoritmů jsou sloučeny pomocí heuristických pravidel odstraňujících chyby transkripce. Závěrem jsou navržena nová kritéria pro vyhodnocování, která jsou použita na výsledky transkripce několika hudebních nahrávek.

Klíčová slova: transkripce hudby, detekce základní harmonické, detekce nestacionarit, jednohlasá hudba ii Acknowledgements First of all, I would like to thank to my parents and relatives for their immense support during my studies.

I would like to thank to Ing. Radim Špetík for supervision of this thesis.

Next, I thank very much to Prof. Ing. Pavel Sovka, CSc. and Doc. Ing. Petr Pollák, CSc. for excellent education, as well as for help in the design of new criteria.

The biggest thanks goes to my initial supervisor Hadas Ofir who is the co-author of the proposed system. A special thanks goes also to Nimrod Peleg for choosing the very best topic of my summer project in the SIPL laboratory. I am also very grateful to Heikki Jokinen, Pekka Kumpulainen and Anssi Klapuri for stimulating my interest in signal processing and audio applications in particular.

Thanks goes also to my friends Michal Olexa and Václav Vozár for valuable hints concerning DSP, Matlab and TeX. And as a nice Czech proverb says: ”the best at the end”, I would like to thank to my ♥ Zuzka Lenochová.

This work is dedicated to all musicians I have had the opportunity and the pleasure to play with...

Poděkování V první řadě bych chtěl poděkovat svým rodičům a příbuzným za nesmírnou podporu v průběhu studia.

Rád bych poděkoval Ing. Radimu Špetíkovi za odborné vedení této diplomové práce. Dále velmi děkuji Prof. Ing. Pavlu Sovkovi, CSc. a Doc. Ing.

Petru Pollákovi, CSc. za vynikajicí výuku a také pomoc při návrhu nových kritérií.

Největší dík patří mé počáteční odborné vedoucí Hadas Ofir, jež je zároveň spoluautorkou navrženého systému. Zvlaštní dík patří též Nimrodu Pelegovi za výběr toho nejlepšího možného tématu pro můj letní projekt v laboratoři SIPL. Jsem také velmi vděčný Heikki Jokinenovi, Pekka Kumpulainenovi a Anssi Klapurimu za vznícení mého zájmu o zpracování signálů se zaměřením na audio aplikace.

Děkuji též svým přátelům Michalu Olexovi a Václavu Vozárovi za cenné rady tykající se DSP, Matlabu a TeXu. A podle úsloví ”to nejlepší nakonec” bych rád poděkoval své ♥ Zuzce Lenochové.

Tato práce je věnována všem muzikantům, se kterými jsem měl tu možnost a čest si zahrát...

iii Čestné prohlášení Prohlašuji na svou čest, že jsem zde uvedenou diplomovou práci vypracoval samostatně, pouze za odborného vedení Ing. Radima Špetíka a při psaní diplomové práce jsem nepoužil jiných informačních zdrojů než zde uvedených.

–  –  –

Chapter 1 Introduction

1.1 Characterization of the Problem Automatic transcription of music is a task of converting a particular piece of music into symbolic representation by means of a computational system.

Symbolic representation is generally depicted using the standard music notation which consists of notes characterized by specific frequency and duration. From the transcription point of view, music can be classified as polyphonic and monophonic. The former consists of multiple simultaneously sounding notes, whereas the latter contains only a single note at each time instant, such as a saxophone solo or singing of a single vocalist.





Automatic transcription of music is related to several fields of science, including Musicology, Psychoacoustics, and Computational Auditory Scene Analysis (CASA). It belongs to the music content analysis discipline which consists of other audio research topics, such as rhythm analysis, instrument recognition, and sound separation. It has been studied since 1970s.

1.2 Literature Review The state-of-the-art in music transcription is focused on the polyphonic transcription, since the monophonic transcription is considered as practically solved [Klapuri1998], [Martins2001]. However, it represents an important case which should be treated separately with much stricter demands on the transcription quality, which still seems to be relatively limited for polyphonic transcribers. Extensive review of published polyphonic systems can be found in [Klapuri1998].

Since monophonic music share various properties with speech, many algorithms suitable for the music transcription purposes originate in speech processing [Rabiner1976], [Hess1983], [Andre-Obrecht1986],[Medan1991]. Recent works in monophonic music transcription explore the potential of the wavelet transform [Cemgil1995a], [Cemgil1995b], [Jehan1997], time-domain techniques based on autocorrelation [Bello2002], and probabilistic modelling using Hidden Markov Models [Ryynänen2004]. In addition to that, [Bořil2003] developed a simple and robust algorithm for real-time MIDI conversion, referred to as DFE algorihtm (Direct Time Domain Fundamental Frequency Estimation). This system performs separate monophonic analysis of a signal from each guitar string, and therefore illustrates that monophonic transcribers can be used in special polyphonic transcription systems.

1.3 Applications Applications of automatic transcription systems are numerous, though limited due to insufficient reliability and robustness. The following list presents the potential areas of interest.

• Computer music applications Music transcription system is a useful tool for composers and musicians, since it provides means to easily analyze and edit the music recordings.

It is especially attractive for the real-time transcription of sounds to musical score.

• Coding of audio signals Conversion of signal samples to symbolic representation significantly reduces the amount of data, and can be therefore used for the compression purposes. An example method is the structured audio (SA) coding described in the MPEG-4 Standard.

• Mobile technology Reliable transcription systems could be commercially applied in cellular phones to automatically create monophonic or polyphonic Ringtones. Such feature would allow customers to record their own musical compositions by a cellular phone and transmit the MIDI files via the Internet or the GSM network.

• Machine perception Analogically to computer vision, the ability of computers to hear music would improve the interaction between humans and systems with artificial intelligence.

• Music teaching Future transcription systems could be used in training of singers and solo instrument players, as well as assist in ear training of novice musicians. Such systems would compare the exact musical notation with the performance of an artist and objectively evaluate the performance quality.

1.4 Organization of the thesis This thesis inclines to be oriented more practically than theoretically, and thus briefly explains only the essential background information and refers the reader to other publications, often available online. For this reason, it omits a separate theoretical chapter and defines the necessary terms ”on-the-fly” during the description of the transcription system.

This thesis is organized as follows. Chapter 2 gives an overview of the MIDI standard. Chapter 3 presents the implemented solution. In Chapter 4, new criteria for evaluation are proposed and the transcription results are presented. Finally, Chapter 5 summarizes the accomplishments.

Chapter 2 The MIDI Standard

2.1 MIDI Introduction The Musical Instrument Digital Interface (MIDI) provides a standardized means of conveying musical performance information as electronic data. It has been accepted and utilized by musicians and composers since its conception in 1983, and is nowadays widely used for communication between sound cards, musical keyboards, sequencers, and other electronic instruments. A complete description of the MIDI protocol is defined in the MIDI 1.0 Specification established and updated by the MIDI Manufacturers Association [MMA2004].

The main advantage of MIDI is data storage efficiency: a typical MIDI sequence requires approximately 10 Kbytes of data per minute of sound.

Contrary to WAV files, which contain digitally sampled audio in the PCM format, the MIDI files consist of MIDI messages which can be understood as special instructions for synthesizers to generate the real sounds. These messages thus provide very efficient symbolic representation of music. Moreover, the MIDI files are also editable, allowing the music to be rearranged or even composed interactively.

2.2 MIDI Basics The MIDI architecture consists of three main components: hardware interface (connector), a communication protocol (language), and a distribution format (Standard MIDI File).

2.2.1 MIDI Hardware Interface The MIDI interface of each instrument is generally provided by three MIDI connectors, labeled IN, OUT, and THRU. The only approved MIDI connector is a 5-pin DIN connector. The physical MIDI channel is divided into 16 logical channels, each capable of transmitting MIDI messages from and to a single musical instrument.

2.2.2 MIDI Communication Protocol The MIDI data stream is a unidirectional asynchronous bit stream at 31,25 Kbits/s with 10 bits transmitted per byte (a start bit, 8 data bits, and one stop bit). The MIDI protocol is composed of MIDI messages in a binary form; each message is formed by an 8-bit status byte, followed by one or two data bytes.

MIDI messages are processed in real time, i.e. when a MIDI synthesizer receives a note-on message, it plays the appropriate sound, and stops this sound when the corresponding note-off message is received. Similarly, when a key is pressed on the musical instrument keyboard, the note-on message is immediately generated, as well as the note-off message is generated when this key is then released. Therefore, no timing information is transmitted with the MIDI messages in the real time applications.

2.2.3 Standard MIDI Files However, in order to store the MIDI data as a data file, time-stamping of the MIDI messages must be performed to guarantee playback in a proper time sequence. In other words, each message is assigned a value of time in the SMPTE format (hours : minutes : seconds : frames) and the resulting specification is referred to as the Standard MIDI File (SMF) format. In addition to that, the SMF specification further defines three MIDI file formats, because the MIDI sequencers can generally manage multiple MIDI data streams, called tracks.

• MIDI Format 0 stores all MIDI data in a single track, although it may represent several musical parts at different MIDI channels.

• MIDI Format 1 stores MIDI data as a collection of tracks (up to 256), each musical part separated in its own track.

• MIDI Format 2, which is relatively rare and often not supported, can store several independent songs.

Since this work is concerned with the monophonic audio, only the MIDI Format 0 is used and the terms track and MIDI channel are interchangeable.

It should also be noted that the MIDI files can be converted by the MIDI File Format Conversion Utility provided by [Glatt2004].

Finally, a MIDI file can also be understood as a ”musical version” of an ASCII text file, except that it contains binary data. Indeed, [Glatt2004] also offers the MIDI File Dis-Assembler Utility converting a MIDI file to a readable text, which can then be edited in a text editor and converted back to a modified MIDI file.

2.3 MIDI File Representations Although there are many different types of MIDI messages, this work is concerned only with the note-on and note-off messages carrying the musical notes data, and hence comprising most of the traffic in a typical MIDI data stream. The remaining MIDI messages are applied mainly for hardware tasks, such as selecting which instrument to play, mixing and panning sounds, and controlling various aspects of electronic musical instruments.



Pages:   || 2 | 3 | 4 | 5 |   ...   | 6 |


Similar works:

«ISSN (Online) : 2319 8753 ISSN (Print) : 2347 6710 International Journal of Innovative Research in Science, Engineering and Technology An ISO 3297: 2007 Certified Organization, Volume 2, Special Issue 1, December 2013 Proceedings of International Conference on Energy and Environment-2013 (ICEE 2013) On 12th to 14th December Organized by Department of Civil Engineering and Mechanical Engineering of Rajiv Gandhi Institute of Technology, Kottayam, Kerala, India SHALLOW WATER BATHYMETRY USING...»

«Traffic Engineering Techniques and Algorithms for the Internet Debashis Basak1, Hema Tahilramani Kaur, Shivkumar Kalyanaraman2 Email: D.Basak@accelight.com, {hema,shivkuma}@networks.ecse.rpi.edu Abstract Traffic engineering broadly relates to optimization of the operational performance of a network. This survey discusses techniques like multi-path routing, traffic splitting, constraint-based routing, path-protection etc. that are used for traffic engineering in contemporary Internet Service...»

«European Society of Ophthalmology Congress, 6-9 June, 2015, Vienna, Austria Scientific Programme Friday, 05 June 2015 EUPO Course 08:35 10:00 Hall G EU01, Section 1: Basic Concepts Chairs: Philippe Kestelyn (Belgium) Immunologic mechanisms Gerhild Wildner (Germany) Molecular techniques as diagnostic methods Stephan Thurau (Germany) Imaging in uveitis Carl P. Herbort (Switzerland) Uses of electrophysiology in uveitis Graham Holder (United Kingdom) EUPO Course 10:30 12:10 Hall G EU02, Section 2:...»

«In: Jef Verschueren, Jan-Ola Östman, Jan Blommaert (eds) Handbook of pragmatics. (Amsterdam: John Benjamins, 2001 installment). Language Change1 Raymond Hickey Essen University Introduction It is an obvious truism to say that, given the dynamic nature of language, change is ever present. However, language change as a concept and as a subject of linguistic investigation is often regarded as something separate from the study of language in general. Recent research into the topic, however, has...»

«1 Liberalization only at the margins? Analyzing the growth of contingent work in German core manufacturing sectors Wordcount: 9,958 (without abstract) Abstract Drawing on workers’ surveys and workplace interviews, this article investigates the growth of contingent work in German manufacturing sectors since the eighties. Findings partly confirm a “dualization” scenario as workers without industry-specific vocational training are more likely to be on a temporary contract than skilled...»

«International Bank Management Dileep Mehta and Hung-Gay Fung International Bank Management International Bank Management Dileep Mehta and Hung-Gay Fung © 2004 by Dileep Mehta and Hung-Gay Fung 350 Main Street, Malden, MA 02148-5020, USA 9600Cowley Road, Oxford OX4 1JF, UK UK 108 Garsington Road, Oxford OX4 2DQ, 550 Swanston Street, Carlton,Victoria 3053, Australia The right of Dileep Mehta and Hung-Gay Fung to be identified as the Authors of this Work has been asserted in accordance with the...»

«The Dhamma Path Through Relationship: Experience of Vipassana Meditators with Respect to Intimate Relationship Todd Blattner Goddard College, VT. June, 2003 Thesis Advisors Jim Fitzgerald Tracy Garrett Abstract Vipassana meditation is a practice that can aid in the development of selfawareness; relationship research points to the positive contribution of selfawareness to relationship. This study was carried out as a phenomenological investigation of committed relationship as experienced by...»

«ZTE-BLADE Bedienungsanleitung RECHTLICHE HINWEISE Copyright © 2011 ZTE Alle Rechte vorbehalten. Teile dieses Benutzerhandbuchs dürfen ohne die vorherige schriftliche Zustimmung der ZTE in keiner Form zitiert, vervielfältigt, übersetzt oder in irgendeiner Form und unter Verwendung beliebiger Mittel, ob elektronisch oder mechanisch, inklusive der Erstellung von Fotokopien und Mikrofilmen, verwendet werden. Dieses Handbuch wird von der ZTE herausgegeben. Wir behalten uns das Recht vor,...»

«Lehrstuhl für Betriebswissenschaften und Montagetechnik der Technischen Universität München Methode zur Anwendung der berührungslosen Handhabung mittels Ultraschall im automatisierten Montageprozess Thomas Bernhard Kirchmeier Vollständiger Abdruck der von der Fakultät für Maschinenwesen der Technischen Universität München zur Erlangung des akademischen Grades eines Doktor-Ingenieurs (Dr.-Ing.) genehmigten Dissertation. Vorsitzender: Univ.-Prof. Dr.-Ing. M. Zäh Prüfer der...»

«A Guide to ICC Spray Booth Codes The following is a discussion of the 2003 ICC codes and standards adopted in 48 of 50 states. The first section deals with Fire and Life Safety, the second section deals with Ventilation. This limited discussion is by no means complete but offers and cross-references the major rules and regulations. This guide is intended to assist all code consultants, plan reviewers, and authorities having jurisdiction in determining the applicability of the new codes and...»

«_ ЛИТЕРАТУРА И МЕДИЦИНА _ © 2015 – Исанна Лихтенштейн All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from both the copyright owner and the publisher. Requests for permission to make copies of any part of this work should be e-mailed to: altaspera@gmail.com В...»

«Jonathan W. Hurst Contact School of Mechanical, Industrial, Office: (541) 737-7010 Information and Manufacturing Engineering Fax: (541) 737-2600 Oregon State University E-mail: jonathan.hurst@oregonstate.edu 204 Rogers Hall WWW: http://mime.oregonstate.edu/research/jhurst/ Corvallis, OR 97331-6001 Education Carnegie Mellon University, Pittsburgh, Pennsylvania USA Ph.D., Robotics, August 12, 2008 Dissertation Topic: “The Role of Compliance in Legged Locomotion.” Advisor: Jessica K. Hodgins...»





 
<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.