Chapter 1: Introduction to the Java Sound API

Design Goals

The Java Sound API is a low-level API for effecting and controlling the input and output of sound media, including both audio and Musical Instrument Digital Interface (MIDI) data. The Java Sound API provides explicit control over the capabilities normally required for sound input and output, in a framework that promotes extensibility and flexibility.

Who is the Java Sound API For?

Because sound is so fundamental, the Java Sound API fulfills the needs of a wide range of application developers. Potential application areas include:

Communication frameworks, such as conferencing and telephony
End-user content delivery systems, such as media players and music using streamed content
Interactive application programs, such as games and Web sites that use dynamic content
Content creation and editing
Tools, toolkits, and utilities

How Does the Java Sound API Relate to Other Interfaces?

The Java Sound API provides the lowest level of sound support on the Java platform. It provides application programs with a great amount of control over sound operations, and it is extensible. For example, the Java Sound API supplies mechanisms for installing, accessing, and manipulating system resources such as audio mixers, MIDI synthesizers, other audio or MIDI devices, file readers and writers, and sound format converters. The Java Sound API does not include sophisticated sound editors or graphical tools, but it provides capabilities upon which such programs can be built. It emphasizes low-level control beyond that commonly expected by the end user.

There are other Java platform APIs that have sound-related elements. The Java Media Framework (JMF) is a higher-level API that is currently available as a Standard Extension to the Java platform. JMF specifies a unified architecture, messaging protocol, and programming interface for capturing and playing back time-based media. JMF provides a simpler solution for basic media-player application programs, and it enables synchronization between different media types, such as audio and video. On the other hand, programs that focus on sound can benefit from the Java Sound API, especially if they require more advanced features, such as the ability to carefully control buffered audio playback or directly manipulate a MIDI synthesizer. Other Java APIs with sound aspects include Java 3D and APIs for telephony and speech. An implementation of any of these APIs might use an implementation of the Java Sound API internally, but is not required to do so.

Packages

The Java Sound API includes support for both digital audio and MIDI data. These two major modules of functionality are provided in separate packages:

javax.sound.sampled This package specifies interfaces for capture, mixing, and playback of digital (sampled) audio.
javax.sound.midi This package provides interfaces for MIDI synthesis, sequencing, and event transport.

Two other packages permit service providers (as opposed to application developers) to create custom software components that extend the capabilities of an implementation of the Java Sound API:

javax.sound.sampled.spi
javax.sound.midi.spi

The rest of this chapter briefly discusses the sampled-audio system, the MIDI system, and the SPI packages. Each of these is then discussed in detail in a subsequent part of the guide.

Sampled Audio

What Is Sampled Audio?

The javax.sound.sampled package handles digital audio data, which the Java Sound API refers to as sampled audio. Samples are successive snapshots of a signal. In the case of audio, the signal is a sound wave. A microphone converts the acoustic signal into a corresponding analog electrical signal, and an analog-to-digital converter transforms that analog signal into a sampled digital form. The following figure shows a brief moment in a sound recording.

A Sampled Sound Wave

This graph plots sound pressure (amplitude) on the vertical axis, and time on the horizontal axis. The amplitude of the analog sound wave is measured periodically at a certain rate, resulting in the discrete samples (the red data points in the figure) that comprise the digital audio signal. The center horizontal line indicates zero amplitude; points above the line are positive-valued samples, and points below are negative. The accuracy of the digital approximation of the analog signal depends on its resolution in time (the sampling rate) and its quantization, or resolution in amplitude (the number of bits used to represent each sample). As a point of reference, the audio recorded for storage on compact discs is sampled 44,100 times per second and represented with 16 bits per sample.

The term "sampled audio" is used here slightly loosely. A sound wave could be sampled at discrete intervals while being left in an analog form. For purposes of the Java Sound API, however, "sampled audio" is equivalent to "digital audio."

Typically, sampled audio on a computer comes from a sound recording, but the sound could instead be synthetically generated (for example, to create the sounds of a touch-tone telephone). The term "sampled audio" refers to the type of data, not its origin.

Further information about the structure of digital audio data is given under "What Is Formatted Audio Data?" in Chapter 2, "Overview of the Sampled Package."

Audio Configurations

The Java Sound API does not assume a specific audio hardware configuration; it is designed to allow different sorts of audio components to be installed on a system and accessed by the API. The Java Sound API supports common functionality such as input and output from a sound card (for example, for recording and playback of sound files) as well as mixing of multiple streams of audio. Here is one example of a typical audio architecture:

The following context describes this graphic

A Typical Audio Architecture

In this example, a device such as a sound card has various input and output ports, and mixing is provided in the software. The mixer might receive data that has been read from a file, streamed from a network, generated on the fly by an application program, or produced by a MIDI synthesizer. (The javax.sound.midi package, discussed next, supplies a Java language interface for synthesizers.) The mixer combines all its audio inputs into a single stream, which can be sent to an output device for rendering.

MIDI

The javax.sound.midi package contains APIs for transporting and sequencing MIDI events, and for synthesizing sound from those events.

What Is MIDI?

Whereas sampled audio is a direct representation of a sound itself, MIDI data can be thought of as a recipe for creating a sound, especially a musical sound. MIDI data, unlike audio data, does not describe sound directly. Instead, it describes events that affect the sound a synthesizer is making. MIDI data is analogous to a graphical user interface's keyboard and mouse events. In the case of MIDI, the events can be thought of as actions upon a musical keyboard, along with actions on various pedals, sliders, switches, and knobs on that musical instrument. These events need not actually originate with a hardware musical instrument; they can be simulated in software, and they can be stored in MIDI files. A program that can create, edit, and perform these files is called a sequencer. Many computer sound cards include MIDI-controllable music synthesizer chips to which sequencers can send their MIDI events. Synthesizers can also be implemented entirely in software. The synthesizers interpret the MIDI events that they receive and produce audio output. Usually the sound synthesized from MIDI data is musical sound (as opposed to speech, for example). MIDI synthesizers are also capable of generating various kinds of sound effects.

Some sound cards include MIDI input and output ports to which external MIDI hardware devices (such as keyboard synthesizers or other instruments) can be connected. From a MIDI input port, an application program can receive events generated by an external MIDI-equipped musical instrument. The program might play the musical performance using the computer's internal synthesizer, save it to disk as a MIDI file, or render it into musical notation. A program might use a MIDI output port to play an external instrument, or to control other external devices such as recording equipment.

More information about MIDI data is given in Chapter 8, "Overview of the MIDI Package," particularly in the section "A MIDI Refresher: Wires and Files."

MIDI Configurations

The diagram below illustrates the functional relationships between the major components in a possible MIDI configuration based on the Java Sound API. (As with audio, the Java Sound API permits a variety of MIDI software devices to be installed and interconnected. The system shown here is one potential scenario.) The flow of data between components is indicated by arrows. The data can be in a standard file format, or (as indicated by the key in the lower right corner of the diagram), it can be audio, raw MIDI bytes, or time-tagged MIDI messages.

The following context describes this graphic

A Possible MIDI Configuration

In this example, the application program prepares a musical performance by loading a musical score that's stored as a standard MIDI file on a disk (left side of the diagram). Standard MIDI files contain tracks, each of which is a list of time-tagged MIDI events. Most of the events represent musical notes (pitches and rhythms). This MIDI file is read and then "performed" by a software sequencer. A sequencer performs its music by sending MIDI messages to some other device, such as an internal or external synthesizer. The synthesizer itself may read a soundbank file containing instructions for emulating the sounds of certain musical instruments. If not, the synthesizer will play the notes stored in the MIDI file using whatever instrument sounds are already loaded into the synthesizer.

As illustrated, the MIDI events must be translated into raw (non-time-tagged) MIDI before being sent through a MIDI output port to an external synthesizer. Similarly, raw MIDI data coming into the computer from an external MIDI source (a keyboard instrument, in the diagram) is translated into time-tagged MIDI messages that can control a synthesizer, or that a sequencer can store for later use. All these aspects of MIDI data flow are explained in detail in the subsequent chapters on MIDI (see Part II of this guide).

Service Provider Interfaces

The javax.sound.sampled.spi and javax.sound.midi.spi packages contain APIs that let software developers create new audio or MIDI resources that can be provided separately to the user and "plugged in" to an existing implementation of the Java Sound API. Here are some examples of services (resources) that can be added in this way:

An audio mixer
A MIDI synthesizer
A file parser that can read or write a new type of audio or MIDI file
A converter that translates between different sound data formats

In some cases, services are software interfaces to the capabilities of hardware devices, such as sound cards, and the service provider might be the same as the vendor of the hardware. In other cases, the services exist purely in software. For example, a synthesizer or a mixer could be an interface to a chip on a sound card, or it could be implemented without any hardware support at all.

An implementation of the Java Sound API contains a basic set of services, but the service provider interface (SPI) packages allow third parties to create new services. These third-party services are integrated into the system in the same way as the built-in services. The AudioSystem class in the sampled package and the MidiSystem class in the midi package act as coordinators that let application programs access the services explicitly or implicitly. Often the existence of a service is completely transparent to an application program that uses it. The service-provider mechanism benefits users of application programs based on the Java Sound API, because new sound features can be added to a program without requiring a new release of the JDK or runtime environment, and, in many cases, without even requiring a new release of the application program itself.