man pages section 1: User Commands

Exit Print View

Updated: July 2014

sox (1)


sox - lation


sox [global-options] [format-options] infile1
[[format-options] infile2] ... [format-options] outfile
[effect [effect-options]] ...

play [global-options] [format-options] infile1
[[format-options] infile2] ... [format-options]
[effect [effect-options]] ...

rec [global-options] [format-options] outfile
[effect [effect-options]] ...


Sound eXchange                                             SoX(1)

     SoX  - Sound eXchange, the Swiss Army knife of audio manipu-

     sox [global-options] [format-options] infile1
          [[format-options] infile2] ... [format-options] outfile
          [effect [effect-options]] ...

     play [global-options] [format-options] infile1
          [[format-options] infile2] ... [format-options]
          [effect [effect-options]] ...

     rec [global-options] [format-options] outfile
          [effect [effect-options]] ...

     SoX reads and writes audio files in most popular formats and
     can  optionally apply effects to them. It can combine multi-
     ple input sources, synthesise audio, and, on  many  systems,
     act as a general purpose audio player or a multi-track audio
     recorder. It also has limited ability  to  split  the  input
     into multiple output files.

     All  SoX  functionality is available using just the sox com-
     mand.  To simplify playing and recording audio,  if  SoX  is
     invoked  as play, the output file is automatically set to be
     the default sound device, and if invoked as rec, the default
     sound  device is used as an input source.  Additionally, the
     soxi(1) command provides a  convenient  way  to  just  query
     audio file header information.

     The  heart  of SoX is a library called libSoX.  Those inter-
     ested in extending SoX or using it in other programs  should
     refer to the libSoX manual page: libsox(3).

     SoX  is  a  command-line audio processing tool, particularly
     suited to making quick, simple edits and to  batch  process-
     ing.   If  you  need an interactive, graphical audio editor,
     use audacity(1).
                         *        *        *

     The overall SoX processing chain can be summarised  as  fol-
             Input(s) -> Combiner -> Effects -> Output(s)

     Note however, that on the SoX command line, the positions of
     the Output(s) and the Effects are swapped w.r.t. the logical
     flow  just  shown.  Note also that whilst options pertaining
     to files are placed before their respective file  name,  the
     opposite  is  true  for  effects.  To show how this works in

sox               Last change: February 19, 2011                1

Sound eXchange                                             SoX(1)

     practice, here is a selection of examples of how  SoX  might
     be used.  The simple

        sox recital.wav

     translates an audio file in Sun AU format to a Microsoft WAV
     file, whilst

        sox -b 16 recital.wav channels 1 rate 16k fade 3 norm

     performs the same format translation, but also applies  four
     effects  (down-mix to one channel, sample rate change, fade-
     in, nomalize), and stores the result at a bit-depth of 16.

        sox -r 16k -e signed -b 8 -c 1 voice-memo.raw voice-memo.wav

     converts  `raw'  (a.k.a.  `headerless')  audio  to  a  self-
     describing file format,

        sox slow.aiff fixed.aiff speed 1.027

     adjusts audio speed,

        sox short.wav long.wav longer.wav

     concatenates two audio files, and

        sox -m music.mp3 voice.wav mixed.flac

     mixes together two audio files.

        play "The Moonbeams/Greatest/*.ogg" bass +3

     plays  a  collection  of  audio files whilst applying a bass
     boosting effect,

        play -n -c1 synth sin %-12 sin %-9 sin %-5 sin %-2 fade h 0.1 1 0.1

     plays a synthesised `A minor seventh'  chord  with  a  pipe-
     organ sound,

        rec -c 2 radio.aiff trim 0 30:00

     records half an hour of stereo audio, and

        play -q take1.aiff & rec -M take1.aiff take1-dub.aiff

     (with POSIX shell and where supported by hardware) records a
     new track in a multi-track recording.  Finally,

sox               Last change: February 19, 2011                2

Sound eXchange                                             SoX(1)

        rec -r 44100 -b 16 -s -p silence 1 0.50 0.1% 1 10:00 0.1% | \
          sox -p song.ogg silence 1 0.50 0.1% 1 2.0 0.1% : \
          newfile : restart

     records a stream of audio such as LP/cassette and splits  in
     to multiple audio files at points with 2 seconds of silence.
     Also, it does not start recording until it detects audio  is
     playing and stops after it sees 10 minutes of silence.

     N.B.   The  above is just an overview of SoX's capabilities;
     detailed explanations of how to use all SoX parameters, file
     formats,  and  effects can be found below in this manual, in
     soxformat(4), and in soxi(1).

  File Format Types
     SoX can work with `self-describing' and `raw'  audio  files.
     `self-describing'  formats  (e.g.  WAV,  FLAC,  MP3)  have a
     header that completely describes  the  signal  and  encoding
     attributes of the audio data that follows. `raw' or `header-
     less' formats do not contain this information, so the  audio
     characteristics  of  these must be described on the SoX com-
     mand line or inferred from those of the input file.

     The following four characteristics are used to describe  the
     format of audio data such that it can be processed with SoX:

     sample rate
          The sample rate  in  samples  per  second  (`Hertz'  or
          `Hz').   Digital  telephony traditionally uses a sample
          rate of 8000 Hz (8 kHz), though these days, 16 and even
          32 kHz  are  becoming  more common. Audio Compact Discs
          use 44100 Hz (44.1 kHz). Digital Audio  Tape  and  many
          computer systems use 48 kHz. Professional audio systems
          often use 96 kHz.

     sample size
          The number of bits used to store each  sample.   Today,
          16-bit is commonly used. 8-bit was popular in the early
          days of computer audio. 24-bit is used in  the  profes-
          sional audio arena. Other sizes are also used.

     data encoding
          The  way  in which each audio sample is represented (or
          `encoded').  Some encodings have variants with  differ-
          ent byte-orderings or bit-orderings.  Some compress the
          audio data so that the stored audio data takes up  less
          space  (i.e. disk space or transmission bandwidth) than
          the other format parameters and the number  of  samples
          would  imply.   Commonly-used  encoding  types  include
          floating-point, -law, ADPCM, signed-integer  PCM,  MP3,
          and FLAC.

sox               Last change: February 19, 2011                3

Sound eXchange                                             SoX(1)

          The  number  of  audio  channels contained in the file.
          One (`mono') and two (`stereo') are widely used.  `Sur-
          round sound' audio typically contains six or more chan-

     The term `bit-rate' is a measure of the  amount  of  storage
     occupied by an encoded audio signal over a unit of time.  It
     can depend on all of the above and is typically denoted as a
     number  of  kilo-bits per second (kbps).  An A-law telephony
     signal has a bit-rate of 64 kbs.  MP3-encoded  stereo  music
     typically  has  a  bit-rate  of  128-196  kbps. FLAC-encoded
     stereo music typically has a bit-rate of 550-760 kbps.

     Most self-describing formats also allow  textual  `comments'
     to  be embedded in the file that can be used to describe the
     audio in some way, e.g. for music, the  title,  the  author,

     One  important  use  of  audio  file  comments  is to convey
     `Replay Gain' information.   SoX  supports  applying  Replay
     Gain  information,  but  not  generating  it.   Note that by
     default, SoX copies input file comments to output files that
     support  comments,  so  output files may contain Replay Gain
     information if some was present in the input file.  In  this
     case,  if anything other than a simple format conversion was
     performed then the output file Replay  Gain  information  is
     likely to be incorrect and so should be recalculated using a
     tool that supports this (not SoX).

     The soxi(1) command can be used to display information  from
     audio file headers.

  Determining & Setting The File Format
     There  are  several  mechanisms  available for SoX to use to
     determine or set the  format  characteristics  of  an  audio
     file.  Depending on the circumstances, individual character-
     istics may be determined or set using different  mechanisms.

     To  determine  the format of an input file, SoX will use, in
     order of precedence and as given or available:

     1.  Command-line format options.

     2.  The contents of the file header.

     3.  The filename extension.

     To set the output file format, SoX will  use,  in  order  of
     precedence and as given or available:

     1.  Command-line format options.

sox               Last change: February 19, 2011                4

Sound eXchange                                             SoX(1)

     2.  The filename extension.

     3.  The  input  file  format characteristics, or the closest
         that is supported by the output file type.

     For all files, SoX will exit with an error if the file  type
     cannot  be  determined. Command-line format options may need
     to be added or changed to resolve the problem.

  Playing & Recording Audio
     The play and rec commands are provided so that basic playing
     and recording is as simple as

        play existing-file.wav


        rec new-file.wav

     These two commands are functionally equivalent to

        sox existing-file.wav -d


        sox -d new-file.wav

     Of  course, further options and effects (as described below)
     can be added to the commands in either form.
                         *        *        *

     Some systems provide more than one type of  (SoX-compatible)
     audio  driver,  e.g. ALSA & OSS, or SUNAU & AO.  Systems can
     also have more than one audio device (a.k.a. `sound  card').
     If  more than one audio driver has been built-in to SoX, and
     the default selected by SoX when recording or playing is not
     the  one  that  is  wanted, then the AUDIODRIVER environment
     variable can be used to override the default.   For  example
     (on many systems):

        set AUDIODRIVER=oss
        play ...

     The  AUDIODEV  environment  variable can be used to override
     the default audio device, e.g.

        set AUDIODEV=/dev/dsp2
        play ...
        sox ... -t oss


sox               Last change: February 19, 2011                5

Sound eXchange                                             SoX(1)

        set AUDIODEV=hw:soundwave,1,2
        play ...
        sox ... -t alsa

     Note that the way of setting  environment  variables  varies
     from  system  to  system  -  for some specific examples, see
     `SOX_OPTS' below.

     When playing a file with a sample rate that is not supported
     by  the  audio  output device, SoX will automatically invoke
     the rate effect to perform the necessary sample rate conver-
     sion.  For compatibility with old hardware, the default rate
     quality level is set  to  `low'.  This  can  be  changed  by
     explicitly specifying the rate effect with a different qual-
     ity level, e.g.

        play ... rate -m

     or by using the --play-rate-arg option (see below).
                         *        *        *

     On some systems, SoX allows  audio  playback  volume  to  be
     adjusted  whilst  using  play.   Where  supported,  this  is
     achieved by tapping the `v' & `V' keys during playback.

     To  help  with  setting  a  suitable  recording  level,  SoX
     includes  a  peak-level  meter  which can be invoked (before
     making the actual recording) as follows:

        rec -n

     The recording level should be adjusted  (using  the  system-
     provided  mixer  program,  not  SoX) so that the meter is at
     most occasionally full scale, and never  `in  the  red'  (an
     exclamation mark is shown).  See also -S below.

     Many  file  formats  that compress audio discard some of the
     audio signal information whilst doing so. Converting to such
     a  format and then converting back again will not produce an
     exact copy of the original audio.  This is the case for many
     formats  used in telephony (e.g.  A-law, GSM) where low sig-
     nal bandwidth is more important than  high  audio  fidelity,
     and  for  many  formats used in portable music players (e.g.
     MP3, Vorbis) where adequate fidelity can  be  retained  even
     with  the  large  compression ratios that are needed to make
     portable players practical.

     Formats that discard audio  signal  information  are  called
     `lossy'.   Formats  that  do not are called `lossless'.  The
     term `quality' is used as a measure of how closely the orig-
     inal  audio  signal  can  be  reproduced  when using a lossy

sox               Last change: February 19, 2011                6

Sound eXchange                                             SoX(1)


     Audio file conversion with SoX is lossless when it  can  be,
     i.e. when not using lossy compression, when not reducing the
     sampling rate or number of channels, and when the number  of
     bits  used in the destination format is not less than in the
     source format.  E.g.  converting from an 8-bit PCM format to
     a 16-bit PCM format is lossless but converting from an 8-bit
     PCM format to (8-bit) A-law isn't.

     N.B.  SoX converts all audio files  to  an  internal  uncom-
     pressed  format before performing any audio processing. This
     means that manipulating a file that is  stored  in  a  lossy
     format  can  cause  further  losses in audio fidelity.  E.g.

        sox long.mp3 short.mp3 trim 10

     SoX first decompresses the input MP3 file, then applies  the
     trim  effect, and finally creates the output MP3 file by re-
     compressing  the  audio  -  with  a  possible  reduction  in
     fidelity  above  that which occurred when the input file was
     created.  Hence, if what is ultimately  desired  is  lossily
     compressed  audio,  it  is highly recommended to perform all
     audio processing using lossless file formats and  then  con-
     vert to the lossy format only at the final stage.

     N.B.  Applying multiple effects with a single SoX invocation
     will, in general, produce more accurate results  than  those
     produced using multiple SoX invocations.

     Dithering  is a technique used to maximise the dynamic range
     of audio stored at a particular  bit-depth.  Any  distortion
     introduced by quantisation is decorrelated by adding a small
     amount of white noise to the signal.  In most cases, SoX can
     determine  whether  the  selected processing requires dither
     and will add it during output formatting if appropriate.

     Specifically, by default, SoX automatically adds TPDF dither
     when  the  output  bit-depth  is less than 24 and any of the
     following are true:

     o   bit-depth reduction has been specified explicitly  using
         a command-line option

     o   the  output  file  format supports only bit-depths lower
         than that of the input file format

     o   an effect has increased effective bit-depth  within  the
         internal processing chain

sox               Last change: February 19, 2011                7

Sound eXchange                                             SoX(1)

     For  example,  adjusting  volume  with vol 0.25 requires two
     additional bits in which to  losslessly  store  its  results
     (since  0.25  decimal  equals 0.01 binary).  So if the input
     file bit-depth is 16,  then  SoX's  internal  representation
     will  utilise  18  bits after processing this volume change.
     In order to store the output at the same depth as the input,
     dithering is used to remove the additional bits.

     Use  the  -V option to see what processing SoX has automati-
     cally added. The -D option may be given  to  override  auto-
     matic  dithering.   To  invoke  dithering  manually (e.g. to
     select a noise-shaping curve), see the dither effect.

     Clipping is distortion that  occurs  when  an  audio  signal
     level  (or  `volume') exceeds the range of the chosen repre-
     sentation.  In most cases, clipping is  undesirable  and  so
     should  be  corrected  by  adjusting  the level prior to the
     point (in the processing chain) at which it occurs.

     In SoX, clipping could occur,  as  you  might  expect,  when
     using  the vol or gain effects to increase the audio volume.
     Clipping could also occur with many other effects, when con-
     verting  one format to another, and even when simply playing
     the audio.

     Playing an audio file often involves  resampling,  and  pro-
     cessing by analogue components can introduce a small DC off-
     set and/or amplification, all of which can  produce  distor-
     tion  if  the  audio signal level was initially too close to
     the clipping point.

     For these reasons, it is usual to make sure  that  an  audio
     file's  signal  level  has some `headroom', i.e. it does not
     exceed a particular level below the maximum  possible  level
     for  the given representation.  Some standards bodies recom-
     mend as much as 9dB headroom, but in most cases, 3dB  (  70%
     linear) is enough.  Note that this wisdom seems to have been
     lost in modern music production; in fact,  many  CDs,  MP3s,
     etc.   are now mastered at levels above 0dBFS i.e. the audio
     is clipped as delivered.

     SoX's stat and stats effects can assist in  determining  the
     signal level in an audio file. The gain or vol effect can be
     used to prevent clipping, e.g.

        sox dull.wav bright.wav gain -6 treble +6

     guarantees that the treble boost will not clip.

     If clipping occurs at any point during processing, SoX  will
     display a warning message to that effect.

sox               Last change: February 19, 2011                8

Sound eXchange                                             SoX(1)

     See also -G and the gain and norm effects.

  Input File Combining
     SoX's  input  combiner can be configured (see OPTIONS below)
     to combine multiple files using any of the  following  meth-
     ods: `concatenate', `sequence', `mix', `mix-power', `merge',
     or `multiply'.  The default method is `sequence'  for  play,
     and `concatenate' for rec and sox.

     For  all methods other than `sequence', multiple input files
     must have the same sampling rate. If necessary, separate SoX
     invocations  can  be  used to make sampling rate adjustments
     prior to combining.

     If the `concatenate' combining method is selected  (usually,
     this will be by default) then the input files must also have
     the same number of channels.  The audio from each input will
     be  concatenated in the order given to form the output file.

     The `sequence' combining method  is  selected  automatically
     for  play.  It is similar to `concatenate' in that the audio
     from each input file is sent serially to  the  output  file.
     However,  here the output file may be closed and reopened at
     the corresponding transition between input files.  This  may
     be just what is needed when sending different types of audio
     to an output device, but is not generally  useful  when  the
     output is a normal file.

     If  either  the  `mix'  or  `mix-power'  combining method is
     selected then two or more input files must be given and will
     be  mixed  together  to form the output file.  The number of
     channels in each input file need not be the  same,  but  SoX
     will  issue  a  warning if they are not and some channels in
     the output file will not  contain  audio  from  every  input
     file.   A mixed audio file cannot be un-mixed without refer-
     ence to the original input files.

     If the `merge' combining method is selected then two or more
     input  files  must  be  given and will be merged together to
     form the output file.  The number of channels in each  input
     file  need  not  be the same.  A merged audio file comprises
     all of the channels from all of the input files.  Un-merging
     is possible using multiple invocations of SoX with the remix
     effect.  For example, two mono files could be merged to form
     one  stereo  file.  The  first  and  second mono files would
     become the left and right channels of the stereo file.

     The `multiply' combining method multiplies the sample values
     of  corresponding channels (treated as numbers in the inter-
     val -1 to +1).  If the number of channels in the input files
     is not the same, the missing channels are considered to con-
     tain all zero.

sox               Last change: February 19, 2011                9

Sound eXchange                                             SoX(1)

     When  combining  input  files,  SoX  applies  any  specified
     effects  (including,  for example, the vol volume adjustment
     effect) after the audio has been combined.  However,  it  is
     often  useful  to  be  able to set the volume of (i.e. `bal-
     ance')  the  inputs  individually,  before  combining  takes

     For all combining methods, input file volume adjustments can
     be made manually using the -v option (below)  which  can  be
     given  for  one or more input files. If it is given for only
     some of the input files then the others  receive  no  volume
     adjustment.  In some circumstances, automatic volume adjust-
     ments may be applied (see below).

     The -V option (below) can be used to  show  the  input  file
     volume  adjustments that have been selected (either manually
     or automatically).

     There are some special considerations that need to made when
     mixing input files:

     Unlike  the other methods, `mix' combining has the potential
     to cause clipping in the combiner if no  balancing  is  per-
     formed.   In this case, if manual volume adjustments are not
     given, SoX will try to ensure that clipping does  not  occur
     by  automatically  adjusting  the volume (amplitude) of each
     input signal by a factor of /n, where n  is  the  number  of
     input  files.  If this results in audio that is too quiet or
     otherwise unbalanced then the input file volumes can be  set
     manually  as  described  above. Using the norm effect on the
     mix is another alternative.

     If mixed audio seems loud enough  at  some  points  but  too
     quiet  in  others  then  dynamic range compression should be
     applied to correct this - see the compand effect.

     With the `mix-power' combine method,  the  mixed  volume  is
     approximately  equal  to  that  of one of the input signals.
     This is achieved by balancing using a factor of  /n  instead
     of  /n.   Note that this balancing factor does not guarantee
     that clipping will not occur, but the number of  clips  will
     usually  be  low  and  the resultant distortion is generally

  Output Files
     SoX's default behaviour is to take one or more  input  files
     and write them to a single output file.

     This  behaviour  can  be  changed  by specifying the pseudo-
     effect `newfile' within the effects  list.   SoX  will  then
     enter multiple output mode.

sox               Last change: February 19, 2011               10

Sound eXchange                                             SoX(1)

     In  multiple  output  mode,  a  new file is created when the
     effects prior to the `newfile' indicate they are done.   The
     effects  chain listed after `newfile' is then started up and
     its output is saved to the new file.

     In multiple output mode, a unique number will  automatically
     be  appended  to  the end of all filenames.  If the filename
     has an extension then the  number  is  inserted  before  the
     extension.  This behaviour can be customized by placing a %n
     anywhere in the filename where the number should be  substi-
     tuted.   An  optional  number  can  be placed after the % to
     indicate a minimum fixed width for the number.

     Multiple output mode is not very  useful  unless  an  effect
     that  will  stop the effects chain early is specified before
     the `newfile'. If end of file is reached before the  effects
     chain  stops  itself  then no new file will be created as it
     would be empty.

     The following is an example of splitting the first  60  sec-
     onds  of an input file into two 30 second files and ignoring
     the rest.

        sox song.wav ringtone%1n.wav trim 0 30 : newfile : trim 0 30

  Stopping SoX
     Usually SoX will complete its processing and exit  automati-
     cally  once  it  has  read all available audio data from the
     input files.

     If desired, it can  be  terminated  earlier  by  sending  an
     interrupt  signal  to  the  process (usually by pressing the
     keyboard interrupt key which is normally Ctrl-C).  This is a
     natural  requirement  in some circumstances, e.g. when using
     SoX to make a recording.  Note that when using SoX  to  play
     multiple  files, Ctrl-C behaves slightly differently: press-
     ing it once causes SoX to skip to the next file; pressing it
     twice in quick succession causes SoX to exit.

     Another  option to stop processing early is to use an effect
     that has a time period or  sample  count  to  determine  the
     stopping point. The trim effect is an example of this.  Once
     all effects chains have stopped then SoX will also stop.

     Filenames can be simple file  names,  absolute  or  relative
     path  names, or URLs (input files only).  Note that URL sup-
     port requires that wget(1) is available.

     Note: Giving SoX an input or output  filename  that  is  the
     same as a SoX effect-name will not work since SoX will treat
     it as an effect specification.  The only work-around to this

sox               Last change: February 19, 2011               11

Sound eXchange                                             SoX(1)

     is  to avoid such filenames. This is generally not difficult
     since most audio  filenames  have  a  filename  `extension',
     whilst effect-names do not.

  Special Filenames
     The  following special filenames may be used in certain cir-
     cumstances in place of a  normal  filename  on  the  command

     -    SoX  can be used in simple pipeline operations by using
          the special filename `-' which, if  used  as  an  input
          filename,  will  cause  SoX  will  read audio data from
          `standard input' (stdin), and which,  if  used  as  the
          output filename, will cause SoX will send audio data to
          `standard output' (stdout).  Note that when using  this
          option for the output file, and sometimes when using it
          for an input file, the file-type (see  -t  below)  must
          also be given.

     "|program [options] ..."
          This can be used in place of an input filename to spec-
          ify the the given program's standard output (stdout) be
          used  as  an input file.  Unlike - (above), this can be
          used for several inputs to one SoX command.  For  exam-
          ple,  if `genw' generates mono WAV formatted signals to
          its standard output, then the following command makes a
          stereo file from two generated signals:

             sox -M "|genw --imd -" "|genw --thd -" out.wav

          For  headerless (raw) audio, -t (and perhaps other for-
          mat options) will need to be given, preceding the input

          Specifies that filename `globbing' (wild-card matching)
          should be performed by SoX instead  of  by  the  shell.
          This  allows a single set of file options to be applied
          to a group of  files.   For  example,  if  the  current
          directory   contains   three  `vox'  files,  file1.vox,
          file2.vox, and file3.vox, then

             play --rate 6k *.vox

          will be expanded by the `shell' (in most  environments)

             play --rate 6k file1.vox file2.vox file3.vox

          which  will  treat  only the first vox file as having a
          sample rate of 6k.  With

sox               Last change: February 19, 2011               12

Sound eXchange                                             SoX(1)

             play --rate 6k "*.vox"

          the given sample rate option will  be  applied  to  all
          three vox files.

     -p, --sox-pipe
          This  can  be  used  in  place of an output filename to
          specify that the SoX command should be used as in input
          pipe to another SoX command.  For example, the command:

             play "|sox -n -p synth 2" "|sox -n -p synth 2 tremolo 10" stat

          plays two `files' in succession,  each  with  different

          -p is in fact an alias for `-t sox -'.

     -d, --default-device
          This  can  be used in place of an input or output file-
          name to specify that the default audio device  (if  one
          has  been  built into SoX) is to be used.  This is akin
          to invoking rec or play (as described above).

     -n, --null
          This can be used in place of an input or  output  file-
          name to specify that a `null file' is to be used.  Note
          that here, `null file' refers to a SoX-specific  mecha-
          nism  and is not related to any operating-system mecha-
          nism with a similar name.

          Using a null file to input audio is equivalent to using
          a normal audio file that contains an infinite amount of
          silence, and as such is  not  generally  useful  unless
          used with an effect that specifies a finite time length
          (such as trim or synth).

          Using a null file to output audio amounts to discarding
          the  audio  and is useful mainly with effects that pro-
          duce information about the audio instead  of  affecting
          it (such as noiseprof or stat).

          The  sampling  rate  associated  with a null file is by
          default 48 kHz, but, as with a normal file, this can be
          overridden if desired using command-line format options
          (see below).

  Supported File & Audio Device Types
     See soxformat(4) for a list and description of the supported
     file formats and audio device drivers.


sox               Last change: February 19, 2011               13

Sound eXchange                                             SoX(1)

  Global Options
     These  options  can  be specified on the command line at any
     point before the first effect name.

     The SOX_OPTS environment variable can  be  used  to  provide
     alternative  default  values  for SoX's global options.  For

        SOX_OPTS="--buffer 20000 --play-rate-arg -hs --temp /mnt/temp"

     Note that setting SOX_OPTS can potentially  create  unwanted
     changes  in  the behaviour of scripts or other programs that
     invoke SoX.  SOX_OPTS might best be used for things (such as
     in  the given example) that reflect the environment in which
     SoX is being run.  Enabling options such as --no-clobber  as
     default  might be handled better using a shell alias since a
     shell alias will not affect operation in scripts etc.

     One way to ensure  that  a  script  cannot  be  affected  by
     SOX_OPTS  is  to  clear SOX_OPTS at the start of the script,
     but this of course loses the benefit  of  SOX_OPTS  carrying
     some  system-wide  default options.  An alternative approach
     is to explicitly invoke SoX with default option values, e.g.

        SOX_OPTS="-V --no-clobber"
        sox -V2 --clobber $input $output ...

     Note  that  the way to set environment variables varies from
     system to system. Here are some examples:

     Unix bash:

        export SOX_OPTS="-V --no-clobber"

     Unix csh:

        setenv SOX_OPTS "-V --no-clobber"


        set SOX_OPTS=-V --no-clobber

     MS-Windows GUI: via Control Panel  :  System  :  Advanced  :
     Environment Variables

     Mac  OS  X  GUI: Refer to Apple's Technical Q&A QA1067 docu-

     --buffer BYTES, --input-buffer BYTES
          Set the size in bytes of the buffers used for  process-
          ing  audio  (default 8192).  --buffer applies to input,

sox               Last change: February 19, 2011               14

Sound eXchange                                             SoX(1)

          effects, and output processing; --input-buffer  applies
          only  to  input  processing  (for  which  it  overrides
          --buffer if both are given).

          Be aware that large values for --buffer will cause  SoX
          to  be  become slow to respond to requests to terminate
          or to skip the current input file.

          Don't prompt before overwriting an existing  file  with
          the  same name as that given for the output file.  This
          is the default behaviour.

     --combine concatenate|merge|mix|mix-power|multiply|sequence
          Select the input file combining  method;  for  some  of
          these,  short  options are available: -m selects `mix',
          -M selects `merge', and -T selects `multiply'.

          See Input File Combining above for a description of the
          different combining methods.

     -D, --no-dither
          Disable  automatic  dither  -  see  `Dither' above.  An
          example of why this might occasionally be useful is  if
          a  file  has  been converted from 16 to 24 bit with the
          intention of doing some processing on it, but  in  fact
          no  processing  is needed after all and the original 16
          bit file has been lost,  then,  strictly  speaking,  no
          dither is needed if converting the file back to 16 bit.
          See also the stats effect  for  how  to  determine  the
          actual bit depth of the audio within a file.

     --effects-file FILENAME
          Use FILENAME to obtain all effects and their arguments.
          The file is parsed as if the values were  specified  on
          the  command  line.  A new line can be used in place of
          the special ":" marker to separate effect chains.  This
          option causes any effects specified on the command line
          to be discarded.

     -G, --guard
          Automatically invoke the gain effect to  guard  against
          clipping. E.g.

             sox -G infile -b 16 outfile rate 44100 dither -s

          is shorthand for

             sox infile -b 16 outfile gain -h rate 44100 gain -rh dither -s

          See also -V, --norm, and the gain effect.

sox               Last change: February 19, 2011               15

Sound eXchange                                             SoX(1)

     -h, --help
          Show version number and usage information.

     --help-effect NAME
          Show  usage  information  on the specified effect.  The
          name all can be used to show usage on all effects.

     --help-format NAME
          Show information about the specified file format.   The
          name  all  can  be used to show information on all for-

     --i, --info
          Only if given as the first parameter to sox, behave  as

          Deprecated alias for --no-clobber.

          Equivalent   to  --combine  mix  and  --combine  merge,

          If SoX has been  built  with  the  optional  `libmagic'
          library then this option can be given to enable its use
          in helping to detect audio file types.

     --multi-threaded | --single-threaded
          By default, SoX is `single threaded'.  If the  --multi-
          threaded  option is given however then SoX will process
          audio channels for most multi-channel effects in paral-
          lel  on  hyper-threading/multi-core architectures. This
          may reduce processing time, though sometimes it may  be
          necessary  to  use  this  option  in  conjuction with a
          larger buffer size than is the default to gain any ben-
          efit  from  multi-threaded processing (e.g. 131072; see
          --buffer above).

          Prompt before overwriting an  existing  file  with  the
          same name as that given for the output file.

          N.B.  Unintentionally overwriting a file is easier than
          you might think, for example, if you accidentally enter

             sox file1 file2 effect1 effect2 ...

          when what you really meant was

             play file1 file2 effect1 effect2 ...

sox               Last change: February 19, 2011               16

Sound eXchange                                             SoX(1)

          then,  without  this option, file2 will be overwritten.
          Hence,  using  this  option  is  recommended.  SOX_OPTS
          (above),  a `shell' alias, script, or batch file may be
          an appropriate way of permanently enabling it.

          Automatically invoke the gain effect to  guard  against
          clipping and to normalise the audio. E.g.

             sox --norm infile -b 16 outfile rate 44100 dither -s

          is shorthand for

             sox infile -b 16 outfile gain -h rate 44100 gain -nh dither -s

          See also -V, -G, and the gain effect.

     --play-rate-arg ARG
          Selects  a  quality  option  to be used when the `rate'
          effect is automatically invoked whilst  playing  audio.
          This  option is typically set via the SOX_OPTS environ-
          ment variable (see above).

     --plot gnuplot|octave|off
          If not set to off (the default if --plot is not given),
          run in a mode that can be used, in conjunction with the
          gnuplot program or the GNU Octave  program,  to  assist
          with  the  selection  and  configuration of many of the
          transfer-function based effects.  For the  first  given
          effect that supports the selected plotting program, SoX
          will output commands  to  plot  the  effect's  transfer
          function, and then exit without actually processing any
          audio.  E.g.

             sox --plot octave input-file -n highpass 1320 > highpass.plt
             octave highpass.plt

     -q, --no-show-progress
          Run in quiet mode when SoX wouldn't  otherwise  do  so.
          This is the opposite of the -S option.

     -R   Run  in  `repeatable' mode.  When this option is given,
          where applicable, SoX will embed a fixed time-stamp  in
          the  output  file  (e.g.   AIFF) and will `seed' pseudo
          random number generators (e.g.  dither)  with  a  fixed
          number,  thus  ensuring that successive SoX invocations
          with the same inputs and the same parameters yield  the
          same output.

     --replay-gain track|album|off
          Select  whether  or not to apply replay-gain adjustment

sox               Last change: February 19, 2011               17

Sound eXchange                                             SoX(1)

          to input files.  The default is off for  sox  and  rec,
          album  for  play  where  (at least) the first two input
          files are tagged with the same Artist and Album  names,
          and track for play otherwise.

     -S, --show-progress
          Display  input file format/header information, and pro-
          cessing progress as input file(s) percentage  complete,
          elapsed  time,  and  remaining time (if known; shown in
          brackets), and the number of  samples  written  to  the
          output  file.  Also shown is a peak-level meter, and an
          indication if clipping has  occurred.   The  peak-level
          meter  shows  up  to two channels and is calibrated for
          digital audio as follows (right channel shown):
                    dB FSD   Display   dB FSD   Display
                     -25     -          -11     ====
                     -23     =           -9     ====-
                     -21     =-          -7     =====
                     -19     ==          -5     =====-
                     -17     ==-         -3     ======
                     -15     ===         -1     =====!
                     -13     ===-

          A three-second peak-held value of headroom in dBs  will
          be  shown  to  the  right of the meter if this is below

          This option is enabled by default  when  using  SoX  to
          play or record audio.

     -T   Equivalent to --combine multiply.

     --temp DIRECTORY
          Specify  that  any temporary files should be created in
          the given DIRECTORY.  This can be useful if  there  are
          permission  or  free-space  problems  with  the default
          location. In this case, using `--temp .'  (to  use  the
          current directory) is often a good solution.

          Show SoX's version number and exit.

          Set  verbosity.  This is particularly useful for seeing
          how any automatic effects have been invoked by SoX.

          SoX displays messages on the console (stderr) according
          to the following verbosity levels:

          0    No  messages are shown at all; use the exit status

sox               Last change: February 19, 2011               18

Sound eXchange                                             SoX(1)

               to determine if an error has occurred.

          1    Only error messages are shown.  These  are  gener-
               ated  if  SoX  cannot  complete the requested com-

          2    Warning messages are also shown.  These are gener-
               ated  if  SoX can complete the requested commands,
               but not exactly according to the requested command
               parameters, or if clipping occurs.

          3    Descriptions  of  SoX's processing phases are also
               shown.  Useful for seeing exactly how SoX is  pro-
               cessing your audio.

          4 and above
               Messages  to  help  with  debugging  SoX  are also

          By default, the verbosity level  is  set  to  2  (shows
          errors  and warnings). Each occurrence of the -V option
          increases the verbosity level by 1.  Alternatively, the
          verbosity  level  can  be  set to an absolute number by
          specifying it immediately after the -V, e.g.  -V0  sets
          it to 0.

  Input File Options
     These options apply only to input files and may precede only
     input filenames on the command line.

          Override an (incorrect) audio length given in an  audio
          file's  header.  If  this option is given then SoX will
          keep reading audio until it  reaches  the  end  of  the
          input file.

     -v, --volume FACTOR
          Intended  for  use when combining multiple input files,
          this option adjusts the volume of the file that follows
          it  on  the  command  line  by a factor of FACTOR. This
          allows it to  be  `balanced'  w.r.t.  the  other  input
          files.   This  is a linear (amplitude) adjustment, so a
          number less than 1 decreases the volume  and  a  number
          greater  than  1 increases it.  If a negative number is
          given then in addition to the  volume  adjustment,  the
          audio signal will be inverted.

          See also the norm, vol, and gain effects, and see Input
          File Balancing above.

sox               Last change: February 19, 2011               19

Sound eXchange                                             SoX(1)

  Input & Output File Format Options
     These options apply to the input or output file  whose  name
     they  immediately  precede  on the command line and are used
     mainly when working with headerless  file  formats  or  when
     specifying a format for the output file that is different to
     that of the input file.

     -b BITS, --bits BITS
          The number of bits (a.k.a. bit-depth or sometimes word-
          length) in each encoded sample.  Not applicable to com-
          plex encodings such as MP3 or GSM.  Not necessary  with
          encodings  that  have  a  fixed  number  of  bits, e.g.
          A/-law, ADPCM.

          For an input file, the most common use for this  option
          is  to inform SoX of the number of bits per sample in a
          `raw' (`headerless') audio file.  For example

             sox -r 16k -e signed -b 8 input.raw output.wav

          converts a particular `raw' file to  a  self-describing
          `WAV' file.

          For  an  output  file, this option can be used (perhaps
          along with -e) to set the  output  encoding  size.   By
          default  (i.e. if this option is not given), the output
          encoding size will (providing it is  supported  by  the
          output  file  type)  be set to the input encoding size.
          For example

             sox input.cdda -b 24 output.wav

          converts raw CD digital audio (16-bit,  signed-integer)
          to a 24-bit (signed-integer) `WAV' file.

          The number of bytes in each encoded sample.  Deprecated
          aliases for -b 8, -b 16, -b 24, -b 32,  -b  64  respec-

     -c CHANNELS, --channels CHANNELS
          The  number  of  audio channels in the audio file. This
          can be any number greater than zero.

          For an input file, the most common use for this  option
          is  to  inform SoX of the number of channels in a `raw'
          (`headerless') audio file.   Occasionally,  it  may  be
          useful  to  use  this option with a `headered' file, in
          order to override the (presumably incorrect)  value  in
          the header - note that this is only supported with cer-
          tain file types.  Examples:

sox               Last change: February 19, 2011               20

Sound eXchange                                             SoX(1)

             sox -r 48k -e float -b 32 -c 2 input.raw output.wav

          converts a particular `raw' file to  a  self-describing
          `WAV' file.

             play -c 1 music.wav

          interprets the file data as belonging to a single chan-
          nel regardless of what is indicated in the file header.
          Note  that  if the file does in fact have two channels,
          this will result in the file playing at half speed.

          For an output file, this option  provides  a  shorthand
          for  specifying  that  the  channels  effect  should be
          invoked in order to change (if necessary) the number of
          channels  in the audio signal to the number given.  For
          example, the following two commands are equivalent:

             sox input.wav -c 1 output.wav bass -3
             sox input.wav      output.wav bass -3 channels 1

          though the second form is more flexible  as  it  allows
          the effects to be ordered arbitrarily.

     -e ENCODING, --encoding ENCODING
          The  audio  encoding type.  Sometimes needed with file-
          types that support more than  one  encoding  type.  For
          example,  with  raw,  WAV, or AU (but not, for example,
          with MP3 or FLAC).  The available encoding types are as

               PCM  data  stored  as  signed (`two's complement')
               integers.  Commonly used with  a  16  or  24  -bit
               encoding  size.   A  value of 0 represents minimum
               signal power.

               PCM data stored  as  signed  (`two's  complement')
               integers.   Commonly  used  with an 8-bit encoding
               size.  A value  of  0  represents  maximum  signal

               PCM  data  stored  as  IEEE  753  single precision
               (32-bit) or double  precision  (64-bit)  floating-
               point  (`real')  numbers.  A value of 0 represents
               minimum signal power.

               International telephony standard  for  logarithmic
               encoding to 8 bits per sample.  It has a precision

sox               Last change: February 19, 2011               21

Sound eXchange                                             SoX(1)

               equivalent to roughly 13-bit PCM and is  sometimes
               encoded  with  reversed  bit-ordering  (see the -X

          u-law, mu-law
               North American telephony standard for  logarithmic
               encoding  to  8 bits per sample.  A.k.a. -law.  It
               has a precision equivalent to roughly  14-bit  PCM
               and  is sometimes encoded with reversed bit-order-
               ing (see the -X option).

               OKI (a.k.a. VOX, Dialogic, or Intel) 4-bit  ADPCM;
               it  has  a  precision equivalent to roughly 12-bit
               PCM.  ADPCM is a form of  audio  compression  that
               has  a  good  compromise between audio quality and
               encoding/decoding speed.

               IMA (a.k.a. DVI) 4-bit ADPCM; it has  a  precision
               equivalent to roughly 13-bit PCM.

               Microsoft  4-bit ADPCM; it has a precision equiva-
               lent to roughly 14-bit PCM.

               GSM is currently used for the vast majority of the
               world's  digital  wireless  telephone  calls.   It
               utilises several audio formats with different bit-
               rates and associated speech quality.  SoX has sup-
               port for GSM's original 13kbps `Full  Rate'  audio
               format.   It is usually CPU-intensive to work with
               GSM audio.

          Encoding names can be abbreviated where this would  not
          be  ambiguous;  e.g. `unsigned-integer' can be given as
          `un', but not `u' (ambiguous with `u-law').

          For an input file, the most common use for this  option
          is  to  inform SoX of the encoding of a `raw' (`header-
          less') audio file  (see  the  examples  in  -b  and  -c

          For  an  output  file, this option can be used (perhaps
          along with -b) to set the  output  encoding  type   For

             sox input.cdda -e float output1.wav

             sox input.cdda -b 64 -e float output2.wav

sox               Last change: February 19, 2011               22

Sound eXchange                                             SoX(1)

          convert  raw  CD digital audio (16-bit, signed-integer)
          to floating-point `WAV' files (single &  double  preci-
          sion respectively).

          By default (i.e. if this option is not given), the out-
          put encoding type will (providing it  is  supported  by
          the  output  file  type)  be  set to the input encoding

          Deprecated aliases for specifying  the  encoding  types
          signed-integer,  unsigned-integer,  floating-point, mu-
          law, a-law, oki-adpcm, ima-adpcm,  ms-adpcm,  gsm-full-
          rate respectively (see -e above).

          Specifies that filename `globbing' (wild-card matching)
          should not be performed by SoX on the  following  file-
          name.   For  example, if the current directory contains
          the two files `five-seconds.wav' and `five*.wav', then

             play --no-glob "five*.wav"

          can be used to play just the single file `five*.wav'.

     -r, --rate RATE[k]
          Gives the sample rate in Hz (or kHz  if  appended  with
          `k') of the file.

          For  an input file, the most common use for this option
          is to inform SoX of the sample rate of a `raw'  (`head-
          erless')  audio  file  (see  the  examples in -b and -c
          above).  Occasionally it may  be  useful  to  use  this
          option with a `headered' file, in order to override the
          (presumably incorrect) value in the header - note  that
          this  is  only  supported with certain file types.  For
          example, if audio was recorded with  a  sample-rate  of
          say  48k  from  a source that played back a little, say
          1.5%, too slowly, then

             sox -r 48720 input.wav output.wav

          effectively corrects the speed  by  changing  only  the
          file header (but see also the speed effect for the more
          usual solution to this problem).

          For an output file, this option  provides  a  shorthand
          for  specifying  that the rate effect should be invoked
          in order to change (if necessary) the  sample  rate  of
          the  audio signal to the given value.  For example, the
          following two commands are equivalent:

sox               Last change: February 19, 2011               23

Sound eXchange                                             SoX(1)

             sox input.wav -r 48k output.wav bass -3
             sox input.wav        output.wav bass -3 rate 48k

          though the second form is more flexible  as  it  allows
          rate  options to be given, and allows the effects to be
          ordered arbitrarily.

     -t, --type FILE-TYPE
          Gives the type of the audio file.  For both  input  and
          output  files,  this  option is commonly used to inform
          SoX of the type a `headerless' audio  file  (e.g.  raw,
          mp3) where the actual/desired type cannot be determined
          from a given filename extension.  For example:

             another-command | sox -t mp3 - output.wav

             sox input.wav -t raw output.bin

          It can also be used to override the type implied by  an
          input filename extension, but if overriding with a type
          that has a header, SoX will exit  with  an  appropriate
          error message if such a header is not actually present.

          See soxformat(4) for a list of supported file types.

     -L, --endian little
     -B, --endian big
     -x, --endian swap
          These options specify whether  the  byte-order  of  the
          audio  data  is,  respectively,  `little  endian', `big
          endian', or the opposite to that of the system on which
          SoX  is  being  used.   Endianness applies only to data
          encoded as floating-pont,  or  as  signed  or  unsigned
          integers  of 16 or more bits.  It is often necessary to
          specify one of these options for headerless files,  and
          sometimes  necessary  for  (otherwise)  self-describing
          files.  A given endian-setting option  may  be  ignored
          for  an  input  file  whose  header contains a specific
          endianness identifier, or for an output  file  that  is
          actually an audio device.

          N.B.   Unlike other format characteristics, the endian-
          ness (byte, nibble, & bit ordering) of the  input  file
          is  not automatically used for the output file; so, for
          example, when the following is run on  a  little-endian

             sox -B audio.s16 trimmed.s16 trim 2

          trimmed.s16 will be created as little-endian;

             sox -B audio.s16 -B trimmed.s16 trim 2

sox               Last change: February 19, 2011               24

Sound eXchange                                             SoX(1)

          must  be  used to preserve big-endianness in the output

          The -V option can be used to check the selected  order-

     -N, --reverse-nibbles
          Specifies  that  the nibble ordering (i.e. the 2 halves
          of a byte) of the samples should be reversed; sometimes
          useful with ADPCM-based formats.

          N.B.  See also N.B. in section on -x above.

     -X, --reverse-bits
          Specifies  that  the bit ordering of the samples should
          be reversed; sometimes useful with a few (mostly  head-
          erless) formats.

          N.B.  See also N.B. in section on -x above.

  Output File Format Options
     These  options apply only to the output file and may precede
     only the output filename on the command line.

     --add-comment TEXT
          Append a comment  in  the  output  file  header  (where

     --comment TEXT
          Specify  the  comment  text to store in the output file
          header (where applicable).

          SoX will provide a default comment if this  option  (or
          --comment-file)  is  not given. To specify that no com-
          ment should be stored in the output file, use --comment
          "" .

     --comment-file FILENAME
          Specify  a file containing the comment text to store in
          the output file header (where applicable).

     -C, --compression FACTOR
          The compression factor for variably compressing  output
          file  formats.   If  this  option  is  not given then a
          default compression factor will apply.  The compression
          factor  is  interpreted  differently for different com-
          pressing file formats.  See the description of the file
          formats  that  use this option in soxformat(4) for more

     In addition  to  converting,  playing  and  recording  audio

sox               Last change: February 19, 2011               25

Sound eXchange                                             SoX(1)

     files,  SoX  can  be  used  to  invoke  a  number  of  audio
     `effects'.  Multiple effects may be  applied  by  specifying
     them  one  after another at the end of the SoX command line,
     forming an `effects chain'.   Note  that  applying  multiple
     effects  in real-time (i.e. when playing audio) is likely to
     require a high performance computer. Stopping other applica-
     tions may alleviate performance issues should they occur.

     Some of the SoX effects are primarily intended to be applied
     to a single instrument or `voice'.  To facilitate this,  the
     remix  effect  and  the  global SoX option -M can be used to
     isolate then recombine tracks from a multi-track  recording.

  Multiple Effect Chains
     A  single  effects  chain is made up of one or more effects.
     Audio from the input runs through the chain until either the
     end  of  the input file is reached or an effect in the chain
     requests to terminate the chain.

     SoX supports running multiple effects chains over the  input
     audio.   In  this  case, when one chain indicates it is done
     processing audio, the audio data is then  sent  through  the
     next  effects  chain.   This  continues until either no more
     effects chains exist or the input has reached the end of the

     An  effects chain is terminated by placing a : (colon) after
     an effect.  Any following  effects  are  a  part  of  a  new
     effects chain.

     It is important to place the effect that will stop the chain
     as the first effect in the chain.  This is because any  sam-
     ples  that are buffered by effects to the left of the termi-
     nating effect will be discarded.  The amount of samples dis-
     carded  is  related  to the --buffer option and it should be
     kept small, relative to the sample rate, if the  terminating
     effect  cannot  be  first.   Further information on stopping
     effects can be found in the Stopping SoX section.

     There are a  few  pseudo-effects  that  aid  using  multiple
     effects  chains.   These  include  newfile  which will start
     writing to a new output  file  before  moving  to  the  next
     effects  chain and restart which will move back to the first
     effects chain.  Pseudo-effects  must  be  specified  as  the
     first  effect  in  a chain and as the only effect in a chain
     (they must have a : before and after they are specified).

     The following is an example of multiple effects chains.   It
     will  split the input file into multiple files of 30 seconds
     in length.  Each output filename will have unique number  in
     its name as documented in the Output Files section.

sox               Last change: February 19, 2011               26

Sound eXchange                                             SoX(1)

        sox infile.wav output.wav trim 0 30 : newfile : restart

  Common Notation And Parameters
     In  the  descriptions  that follow, brackets [ ] are used to
     denote parameters that are optional, braces {  }  to  denote
     those  that  are  both  optional  and  repeatable, and angle
     brackets < > to denote those that  are  repeatable  but  not
     optional.   Where  applicable,  default  values for optional
     parameters are shown in parenthesis ( ).

     The following parameters are used with, and  have  the  same
     meaning for, several effects:

          See frequency.

          A frequency in Hz, or, if appended with `k', kHz.

     gain A power gain in dB.  Zero gives no gain; less than zero
          gives an attenuation.

          Used to specify the band-width of a filter.   A  number
          of different methods to specify the width are available
          (though not all for every effect).  One of the  charac-
          ters shown may be appended to select the desired method
          as follows:
                               Method    Notes
                          h      Hz
                          k     kHz
                          o   Octaves
                          q   Q-factor   See [2]

          For each effect that uses this parameter,  the  default
          method  (i.e.  if  no character is appended) is the one
          that it listed first in the first line of the  effect's

     To  see if SoX has support for an optional effect, enter sox
     -h and look for its name under the list: `EFFECTS'.

  Supported Effects
     Note: a categorised list of the effects can be found in  the
     accompanying `README' file.

     allpass frequency[k] width[h|k|o|q]
          Apply a two-pole all-pass filter with central frequency
          (in Hz) frequency, and filter-width width.  An all-pass
          filter changes the audio's frequency to phase relation-
          ship  without  changing  its  frequency  to   amplitude

sox               Last change: February 19, 2011               27

Sound eXchange                                             SoX(1)

          relationship.   The  filter  is  described in detail in

          This effect supports the --plot global option.

     band [-n] center[k] [width[h|k|o|q]]
          Apply a band-pass filter.  The frequency response drops
          logarithmically around the center frequency.  The width
          parameter gives the slope of the drop.  The frequencies
          at  center  +  width and center - width will be half of
          their original amplitudes.  band  defaults  to  a  mode
          oriented  to  pitched  audio,  i.e.  voice, singing, or
          instrumental music.  The -n (for noise) option uses the
          alternate  mode for un-pitched audio (e.g. percussion).
          Warning: -n introduces a power-gain of  about  11dB  in
          the  filter, so beware of output clipping.  band intro-
          duces noise in the shape of the filter, i.e. peaking at
          the center frequency and settling around it.

          This effect supports the --plot global option.

          See also sinc for a bandpass filter with steeper shoul-

     bandpass|bandreject [-c] frequency[k] width[h|k|o|q]
          Apply a two-pole Butterworth band-pass  or  band-reject
          filter  with  central  frequency  frequency,  and (3dB-
          point) band-width width.  The -c option applies only to
          bandpass and selects a constant skirt gain (peak gain =
          Q) instead of the default: constant 0dB peak gain.  The
          filters  roll  off  at 6dB per octave (20dB per decade)
          and are described in detail in [1].

          These effects support the --plot global option.

          See also sinc for a bandpass filter with steeper shoul-

     bandreject frequency[k] width[h|k|o|q]
          Apply a band-reject filter.  See the description of the
          bandpass effect for details.

     bass|treble gain [frequency[k] [width[s|h|k|o|q]]]
          Boost or cut the bass (lower) or  treble  (upper)  fre-
          quencies  of the audio using a two-pole shelving filter
          with a response similar to that of a  standard  hi-fi's
          tone-controls.   This is also known as shelving equali-
          sation (EQ).

          gain gives the gain at 0 Hz (for bass), or whichever is
          the  lower  of  ~22 kHz  and the Nyquist frequency (for
          treble).  Its useful range is about -20  (for  a  large

sox               Last change: February 19, 2011               28

Sound eXchange                                             SoX(1)

          cut)  to  +20  (for a large boost).  Beware of Clipping
          when using a positive gain.

          If desired, the filter can be fine-tuned using the fol-
          lowing optional parameters:

          frequency  sets  the  filter's central frequency and so
          can be used to extend or reduce the frequency range  to
          be  boosted  or  cut.  The default value is 100 Hz (for
          bass) or 3 kHz (for treble).

          width determines how steep is the filter's shelf  tran-
          sition.   In addition to the common width specification
          methods described above, `slope' (the  default,  or  if
          appended  with  `s')  may be used.  The useful range of
          `slope' is about 0.3, for a gentle  slope,  to  1  (the
          maximum),  for a steep slope; the default value is 0.5.

          The filters are described in detail in [1].

          These effects support the --plot global option.

          See also equalizer for a peaking equalisation effect.

tion }
     bend [-f frame-rate(25)] [-o over-sample(16)] {
          Changes pitch by specified amounts at specified  times.
          Each  given  triple: delay,cents,duration specifies one
          bend.  delay is the amount of time after the  start  of
          the  audio  stream, or the end of the previous bend, at
          which to start bending the pitch; cents is  the  number
          of  cents (100 cents = 1 semitone) by which to bend the
          pitch, and duration the length of time over  which  the
          pitch will be bent.

          The   pitch-bending  algorithm  utilises  the  Discrete
          Fourier Transform (DFT) at a particular frame rate  and
          over-sampling  rate.   The  -f and -o parameters may be
          used to adjust these parameters and  thus  control  the
          smoothness of the changes in pitch.

          For  example,  an  initial tone is generated, then bent
          three times, yielding four different notes in total:

             play -n synth 2.5 sin 667 gain 1 \
               bend .35,180,.25  .15,740,.53  0,-520,.3

          Note that the clipping that is produced in this example
          is  deliberate;  to  remove it, use gain -5 in place of
          gain 1.

sox               Last change: February 19, 2011               29

Sound eXchange                                             SoX(1)

     biquad b0 b1 b2 a0 a1 a2
          Apply a biquad IIR filter with the given  coefficients.
          Where b* and a* are the numerator and denominator coef-
          ficients respectively.

          (where a0 = 1).

     channels CHANNELS
          Invoke a simple algorithm to change the number of chan-
          nels in the audio signal to the given number  CHANNELS:
          mixing  if  decreasing the number of channels or dupli-
          cating if increasing the number of channels.

          The channels effect is invoked automatically  if  SoX's
          -c  option  specifies a number of channels that is dif-
          ferent to that of the input file(s).  Alternatively, if
          this  effect  is given explicitly, then SoX's -c option
          need not be given.  For example, the following two com-
          mands are equivalent:

             sox input.wav -c 1 output.wav bass -3
             sox input.wav      output.wav bass -3 channels 1

          though  the  second  form is more flexible as it allows
          the effects to be ordered arbitrarily.

          See also remix for an effect that allows channels to be
          mixed/selected arbitrarily.

     chorus gain-in gain-out <delay decay speed depth -s|-t>
          Add a chorus effect to the audio.  This can make a sin-
          gle vocal sound like a chorus, but can also be  applied
          to instrumentation.

          Chorus resembles an echo effect with a short delay, but
          whereas with echo the delay is constant,  with  chorus,
          it is varied using sinusoidal or triangular modulation.
          The modulation depth defines the  range  the  modulated
          delay  is  played  before or after the delay. Hence the
          delayed sound will sound slower or faster, that is  the
          delayed  sound tuned around the original one, like in a
          chorus where some vocals are slightly off key.  See [3]
          for more discussion of the chorus effect.

          Each four-tuple parameter delay/decay/speed/depth gives
          the delay in milliseconds and the  decay  (relative  to
          gain-in)  with  a modulation speed in Hz using depth in
          milliseconds.  The modulation is either sinusoidal (-s)
          or triangular (-t).  Gain-out is the volume of the out-

sox               Last change: February 19, 2011               30

Sound eXchange                                             SoX(1)

          A typical delay is around 40ms to 60ms; the  modulation
          speed  is  best  near  0.25Hz  and the modulation depth
          around 2ms.  For example, a single delay:

             play guitar1.wav chorus 0.7 0.9 55 0.4 0.25 2 -t

          Two delays of the original samples:

             play guitar1.wav chorus 0.6 0.9 50 0.4 0.25 2 -t \
                60 0.32 0.4 1.3 -s

          A  fuller  sounding  chorus  (with   three   additional

             play guitar1.wav chorus 0.5 0.9 50 0.4 0.25 2 -t \
                60 0.32 0.4 2.3 -t 40 0.3 0.3 1.3 -s

     compand attack1,decay1{,attack2,decay2}
          [gain [initial-volume-dB [delay]]]

          Compand  (compress  or expand) the dynamic range of the

          The attack and decay parameters (in seconds)  determine
          the  time  over  which  the  instantaneous level of the
          input signal  is  averaged  to  determine  its  volume;
          attacks  refer  to increases in volume and decays refer
          to decreases.  For most  situations,  the  attack  time
          (response  to  the  music  getting  louder)  should  be
          shorter than the decay time because the  human  ear  is
          more  sensitive  to  sudden loud music than sudden soft
          music.  Where more than one pair of attack/decay param-
          eters  are  specified,  each input channel is companded
          separately and the number of pairs must agree with  the
          number  of  input channels.  Typical values are 0.3,0.8

          The second parameter is a list of points on the compan-
          der's transfer function specified in dB relative to the
          maximum possible signal amplitude.   The  input  values
          must be in a strictly increasing order but the transfer
          function does not have to be monotonically rising.   If
          omitted,  the  value  of  out-dB1  defaults to the same
          value as in-dB1; levels below in-dB1 are not  companded
          (but  may have gain applied to them).  The point 0,0 is
          assumed but may be overridden (by 0,out-dBn).   If  the
          list  is  preceded  by  a  soft-knee-dB value, then the
          points at where adjacent line segments on the  transfer
          function  meet  will  be  rounded  by the amount given.
          Typical  values   for   the   transfer   function   are

sox               Last change: February 19, 2011               31

Sound eXchange                                             SoX(1)


          The third (optional) parameter is an additional gain in
          dB to be applied at all points on the transfer function
          and allows easy adjustment of the overall gain.

          The  fourth (optional) parameter is an initial level to
          be assumed for each  channel  when  companding  starts.
          This  permits  the  user to supply a nominal level ini-
          tially, so that, for example, a very large gain is  not
          applied  to initial signal levels before the companding
          action has begun to operate: it is quite probable  that
          in  such an event, the output would be severely clipped
          while the compander gain properly  adjusts  itself.   A
          typical  value  (for audio which is initially quiet) is
          -90 dB.

          The fifth (optional) parameter is a delay  in  seconds.
          The input signal is analysed immediately to control the
          compander, but it is delayed before being  fed  to  the
          volume  adjuster.   Specifying  a  delay  approximately
          equal to the attack/decay times allows the compander to
          effectively  operate  in  a  `predictive' rather than a
          reactive mode.  A typical value is 0.2 seconds.
                            *        *        *

          The following example might be used to make a piece  of
          music  with  both  quiet and loud passages suitable for
          listening to in a noisy environment such  as  a  moving

             sox asz.wav asz-car.wav compand 0.3,1 6:-70,-60,-20 -5 -90 0.2

          The transfer function (`6:-70,...') says that very soft
          sounds (below -70dB) will remain unchanged.  This  will
          stop the compander from boosting the volume on `silent'
          passages such as between movements.  However, sounds in
          the range -60dB to 0dB (maximum volume) will be boosted
          so that the 60dB dynamic range of  the  original  music
          will  be  compressed 3-to-1 into a 20dB range, which is
          wide enough to enjoy the music but narrow enough to get
          around  the road noise.  The `6:' selects 6dB soft-knee
          companding.  The -5 (dB) output gain is needed to avoid
          clipping  (the  number  is  inexact, and was derived by
          experimentation).  The -90 (dB) for the initial  volume
          will  work  fine  for  a  clip  that  starts  with near
          silence, and the delay of 0.2 (seconds) has the  effect
          of causing the compander to react a bit more quickly to
          sudden volume changes.

          In the next example, compand is being used as a  noise-
          gate  for  when  the noise is at a lower level than the

sox               Last change: February 19, 2011               32

Sound eXchange                                             SoX(1)


             play infile compand .1,.2 -inf,-50.1,-inf,-50,-50 0 -90 .1

          Here is another noise-gate,  this  time  for  when  the
          noise  is at a higher level than the signal (making it,
          in some ways, similar to squelch):

             play infile compand .1,.1 -45.1,-45,-inf,0,-inf 45 -90 .1

          This effect supports the --plot global option (for  the
          transfer function).

          See   also  mcompand  for  a  multiple-band  companding

     contrast [enhancement-amount(75)]
          Comparable with compression, this  effect  modifies  an
          audio  signal  to  make  it sound louder.  enhancement-
          amount controls the amount of the enhancement and is  a
          number  in  the  range  0-100.   Note that enhancement-
          amount = 0 still gives a significant contrast  enhance-

          See also the compand and mcompand effects.

     dcshift shift [limitergain]
          Apply  a  DC shift to the audio.  This can be useful to
          remove a DC offset (caused perhaps by a hardware  prob-
          lem in the recording chain) from the audio.  The effect
          of a DC offset is reduced headroom  and  hence  volume.
          The  stat or stats effect can be used to determine if a
          signal has a DC offset.

          The given dcshift value is a floating point  number  in
          the range of +-2 that indicates the amount to shift the
          audio (which is in the range of +-1).

          An optional limitergain can be specified as  well.   It
          should  have  a  value  much  less than 1 (e.g. 0.05 or
          0.02) and is used only on peaks to prevent clipping.
                            *        *        *

          An alternative approach to removing a DC offset (albeit
          with  a  short  delay)  is  to  use the highpass filter
          effect at a frequency of say 10Hz,  as  illustrated  in
          the following example:

             sox -n dc.wav synth 5 sin %0 50
             sox dc.wav fixed.wav highpass 10

sox               Last change: February 19, 2011               33

Sound eXchange                                             SoX(1)

          Apply  Compact  Disc  (IEC 60908) de-emphasis (a treble
          attenuation shelving filter).

          Pre-emphasis was applied in the mastering of  some  CDs
          issued in the early 1980s.  These included many classi-
          cal music albums, as well as now sought-after issues of
          albums  by  The  Beatles,  Pink Floyd and others.  Pre-
          emphasis should be removed at playback time  by  a  de-
          emphasis  filter  in the playback device.  However, not
          all modern CD players have this filter, and very few PC
          CD drives have it; playing pre-emphasised audio without
          the correct de-emphasis filter results  in  audio  that
          sounds   harsh  and  is  far  from  what  its  creators

          With the deemph effect, it is  possible  to  apply  the
          necessary  de-emphasis to audio that has been extracted
          from a pre-emphasised CD, and then either burn the  de-
          emphasised audio to a new CD (which will then play cor-
          rectly on any CD player), or simply play the  correctly
          de-emphasised audio files on the PC.  For example:

             sox track1.wav track1-deemph.wav deemph

          and then burn track1-deemph.wav to CD, or

             play track1-deemph.wav

          or simply

             play track1.wav deemph

          The  de-emphasis filter is implemented as a biquad; its
          maximum deviation  from  the  ideal  response  is  only
          0.06dB (up to 20kHz).

          This effect supports the --plot global option.

          See  also  the  bass  and  treble shelving equalisation

     delay {length}
          Delay one or more audio channels.  length can specify a
          time  or, if appended with an `s', a number of samples.
          Do not specify both time and samples delays in the same
          command.  For example, delay 1.5 0 0.5 delays the first
          channel by 1.5 seconds, the third channel by  0.5  sec-
          onds,  and  leaves  the  second  channel (and any other
          channels that may be present) un-delayed.  The  follow-
          ing (one long) command plays a chime sound:

sox               Last change: February 19, 2011               34

Sound eXchange                                             SoX(1)

             play -n synth -j 3 sin %3 sin %-2 sin %-5 sin %-9 \
               sin %-14 sin %-21 fade h .01 2 1.5 delay \
               1.3 1 .76 .54 .27 remix - fade h 0 2.7 2.5 norm -1

          and this plays a guitar chord:

             play -n synth pl G2 pl B2 pl D3 pl G3 pl D4 pl G4 \
               delay 0 .05 .1 .15 .2 .25 remix - fade 0 4 .1 norm -1

     dither [-a] [-S|-s|-f filter]
          Apply  dithering  to the audio.  Dithering deliberately
          adds a small amount of noise to the signal in order  to
          mask audible quantization effects that can occur if the
          output sample size is  less  than  24  bits.   With  no
          options,  this  effect will add triangular (TPDF) white
          noise.  Noise-shaping (only for certain  sample  rates)
          can  be  selected  with  -s.  With the -f option, it is
          possible to select a  particular  noise-shaping  filter
          from  the  following  list: lipshitz, f-weighted, modi-
          fied-e-weighted,  improved-e-weighted,  gesemann,  shi-
          bata, low-shibata, high-shibata.  Note that most filter
          types are available only with 44100Hz sample rate.  The
          filter types are distinguished by the following proper-
          ties: audibility of noise, level of (inaudible, but  in
          some  circumstances, otherwise problematic) shaped high
          frequency noise, and processing speed.
          See   for
          graphs of the different noise-shaping curves.

          The  -S option selects a slightly `sloped' TPDF, biased
          towards higher frequencies.  It can be used at any sam-
          pling  rate  but below 22k, plain TPDF is probably bet-
          ter, and above  37k, noise-shaped is probably better.

          The -a option  enables  a  mode  where  dithering  (and
          noise-shaping  if applicable) are automatically enabled
          only when needed.  The most likely use for this is when
          applying fade in or out to an already dithered file, so
          that the redithering applies only  to  the  faded  por-
          tions.   However,  auto dithering is not fool-proof, so
          the fades should be carefully  checked  for  any  noise
          modulation;  if  this occurs, then either re-dither the
          whole file, or use trim, fade, and concatencate.

          If the SoX global option -R option is not  given,  then
          the pseudo-random number generator used to generate the
          white noise will  be  `reseeded',  i.e.  the  generated
          noise will be different between invocations.

          This  effect should not be followed by any other effect
          that affects the audio.

sox               Last change: February 19, 2011               35

Sound eXchange                                             SoX(1)

          See also the `Dither' section above.

          Makes audio easier to listen to  on  headphones.   Adds
          `cues'  to  44.1kHz stereo (i.e. audio CD format) audio
          so that when listened to on headphones the stereo image
          is  moved  from  inside  your  head (standard for head-
          phones) to outside and in front of the listener  (stan-
          dard     for    speakers).     See    http://www.geoci-
 for a full explanation.

     echo gain-in gain-out <delay decay>
          Add echoing to the audio.  Echoes are  reflected  sound
          and  can  occur  naturally amongst mountains (and some-
          times large buildings) when talking or shouting;  digi-
          tal  echo  effects emulate this behaviour and are often
          used to help fill out the sound of a single  instrument
          or  vocal.   The  time  difference between the original
          signal and the reflection is the  `delay'  (time),  and
          the  loudness  of  the reflected signal is the `decay'.
          Multiple echoes can have different delays and decays.

          Each given delay decay pair gives  the  delay  in  mil-
          liseconds  and  the decay (relative to gain-in) of that
          echo.  Gain-out is the volume of the output.  For exam-
          ple:  This  will make it sound as if there are twice as
          many instruments as are actually playing:

             play lead.aiff echo 0.8 0.88 60 0.4

          If the delay is  very  short,  then  it  sound  like  a
          (metallic) robot playing music:

             play lead.aiff echo 0.8 0.88 6 0.4

          A  longer  delay will sound like an open air concert in
          the mountains:

             play lead.aiff echo 0.8 0.9 1000 0.3

          One mountain more, and:

             play lead.aiff echo 0.8 0.9 1000 0.3 1800 0.25

     echos gain-in gain-out <delay decay>
          Add a sequence of echoes  to  the  audio.   Each  delay
          decay  pair  gives  the  delay  in milliseconds and the
          decay (relative to gain-in) of that echo.  Gain-out  is
          the volume of the output.

          Like the echo effect, echos stand for `ECHO in Sequel',

sox               Last change: February 19, 2011               36

Sound eXchange                                             SoX(1)

          that is the first echos takes the input, the second the
          input  and the first echos, the third the input and the
          first and the second echos, ... and so on.  Care should
          be  taken using many echos; a single echos has the same
          effect as a single echo.

          The sample will be bounced twice in symmetric echos:

             play lead.aiff echos 0.8 0.7 700 0.25 700 0.3

          The sample will be bounced twice in asymmetric echos:

             play lead.aiff echos 0.8 0.7 700 0.25 900 0.3

          The sample will sound as if played in a garage:

             play lead.aiff echos 0.8 0.7 40 0.25 63 0.3

     equalizer frequency[k] width[q|o|h|k] gain
          Apply a  two-pole  peaking  equalisation  (EQ)  filter.
          With  this  filter,  the  signal-level  at and around a
          selected  frequency  can  be  increased  or  decreased,
          whilst  (unlike band-pass and band-reject filters) that
          at all other frequencies is unchanged.

          frequency gives the filter's central frequency  in  Hz,
          width,  the  band-width,  and gain the required gain or
          attenuation in dB.  Beware of  Clipping  when  using  a
          positive gain.

          In  order  to produce complex equalisation curves, this
          effect can be given several times, each with a  differ-
          ent central frequency.

          The filter is described in detail in [1].

          This effect supports the --plot global option.

          See  also  bass  and  treble  for shelving equalisation

     fade [type] fade-in-length [stop-time [fade-out-length]]
          Apply a fade effect to the beginning, end, or  both  of
          the audio.

          An  optional  type can be specified to select the shape
          of the fade curve: q for quarter of a sine wave, h  for
          half  a sine wave, t for linear (`triangular') slope, l
          for logarithmic, and  p  for  inverted  parabola.   The
          default is logarithmic.

sox               Last change: February 19, 2011               37

Sound eXchange                                             SoX(1)

          A  fade-in  starts  from the first sample and ramps the
          signal level from 0 to full volume over  fade-in-length
          seconds.  Specify 0 seconds if no fade-in is wanted.

          For fade-outs, the audio will be truncated at stop-time
          and the signal level will be ramped  from  full  volume
          down  to  0  starting at fade-out-length seconds before
          the stop-time.  If fade-out-length is not specified, it
          defaults to the same value as fade-in-length.  No fade-
          out is performed if stop-time is not specified.  If the
          file  length  can  be  determined  from  the input file
          header and length-changing effects are not  in  effect,
          then  0  may be specified for stop-time to indicate the
          usual case of a fade-out that ends at the  end  of  the
          input audio stream.

          All times can be specified in either periods of time or
          sample counts.  To specify time periods use the  format
          hh:mm:ss.frac  format.  To specify using sample counts,
          specify the number of samples and append the letter `s'
          to the sample count (for example `8000s').

          See also the splice effect.

     fir [coefs-file|coefs]
          Use  SoX's FFT convolution engine with given FIR filter
          coefficients.  If a single argument is given then  this
          is  treated as the name of a file containing the filter
          coefficients (white-space separated;  may  contain  `#'
          comments).   If  the  given  filename  is `-', or if no
          argument is given, then the coefficients are read  from
          the  `standard  input' (stdin); otherwise, coefficients
          may be given on the command line.  Examples:

             sox infile outfile fir 0.0195 -0.082 0.234 0.891 -0.145 0.043

             sox infile outfile fir coefs.txt

          with coefs.txt containing

             # HP filter
             # freq=10000

     flanger [delay depth regen width speed shape phase interp]
          Apply a flanging effect to the audio.  See  [3]  for  a

sox               Last change: February 19, 2011               38

Sound eXchange                                             SoX(1)

          detailed description of flanging.

          All parameters are optional (right to left).
                Range     Default   Description
      delay     0 - 30       0      Base delay in milliseconds.
      depth     0 - 10       2      Added swept delay in milliseconds.
      regen    -95 - 95      0      Percentage regeneration (delayed
                                    signal feedback).
      width    0 - 100      71      Percentage of delayed signal mixed
                                    with original.
      speed    0.1 - 10     0.5     Sweeps per second (Hz).
      shape                 sin     Swept wave shape: sine|triangle.
      phase    0 - 100      25      Swept wave percentage phase-shift
                                    for multi-channel (e.g. stereo)
                                    flange; 0 = 100 = same phase on
                                    each channel.
      interp                lin     Digital delay-line interpolation:

     gain [-e|-B|-b|-r] [-n] [-l|-h] [gain-dB]
          Apply amplification or attenuation to the audio signal,
          or, in some cases, to some of its channels.  Note  that
          use  of any of -e, -B, -b, -r, or -n requires temporary
          file space to store the audio to be processed,  so  may
          be unsuitable for use with `streamed' audio.

          Without  other  options,  gain-dB is used to adjust the
          signal power level by the given number of dB:  positive
          amplifies  (beware  of  Clipping), negative attenuates.
          With other options, the gain-dB amplification or atten-
          uation  is (logically) applied after the processing due
          to those options.

          Given the -e option, the levels of the  audio  channels
          of  a multi-channel file are `equalised', i.e.  gain is
          applied to all channels other than that with the  high-
          est  peak level, such that all channels attain the same
          peak level (but, without also giving -n, the  audio  is
          not `normalised').

          The  -B (balance) option is similar to -e, but with -B,
          the RMS level is used instead of the  peak  level.   -B
          might  be used to correct stereo imbalance caused by an
          imperfect  record  turntable  cartridge.    Note   that
          unlike -e, -B might cause some clipping.

          -b  is  similar to -B but has clipping protection, i.e.
          if necessary  to  prevent  clipping  whilst  balancing,
          attenuation is applied to all channels.  Note, however,
          that in conjunction with -n, -B and -b are  synonymous.

          The  -r  option  is  used  in  conjunction with a prior

sox               Last change: February 19, 2011               39

Sound eXchange                                             SoX(1)

          invocation of gain with the -h option - see  below  for

          The  -n  option  normalises the audio to 0dB FSD; it is
          often used in conjunction with a  negative  gain-dB  to
          the  effect  that  the  audio  is normalised to a given
          level below 0dB.  For example,

             sox infile outfile gain -n

          normalises to 0dB, and

             sox infile outfile gain -n -3

          normalises to -3dB.

          The -l option invokes a simple limiter, e.g.

             sox infile outfile gain -l 6

          will apply 6dB of gain but never clip.  Note that  lim-
          iting  more than a few dBs more than occasionally (in a
          piece of audio) is not  recommended  as  it  can  cause
          audible  distortion.  See the compand effect for a more
          capable limiter.

          The -h option is used to apply gain  to  provide  head-
          room for subsequent processing.  For example, with

             sox infile outfile gain -h bass +6

          6dB  of  attenuation  will be applied prior to the bass
          boosting effect thus ensuring that it  will  not  clip.
          Of  course,  with bass, it is obvious how much headroom
          will be needed, but with  other  effects  (e.g.   rate,
          dither)  it  is not always as clear.  Another advantage
          of using gain -h rather than an  explicit  attenuation,
          is  that  if  the  headroom  is  not used by subsequent
          effects, it can be reclaimed with gain -r, for example:

             sox infile outfile gain -h bass +6 rate 44100 gain -r

          The  above  effects  chain guarantees never to clip nor
          amplify; it attenuates if necessary  to  prevent  clip-
          ping, but by only as much as is needed to do so.

          Output  formatting  (dithering and bit-depth reduction)
          also requires headroom (which cannot  be  `reclaimed'),

             sox infile outfile gain -h bass +6 rate 44100 gain -rh dither

sox               Last change: February 19, 2011               40

Sound eXchange                                             SoX(1)

          Here,  the  second gain invocation, reclaims as much of
          the headroom as it can from the preceding effects,  but
          retains  as  much  headroom as is needed for subsequent
          processing.  The SoX global option -G can be  given  to
          automatically invoke gain -h and gain -r.

          See also the norm and vol effects.

     highpass|lowpass [-1|-2] frequency[k] [width[q|o|h|k]]
          Apply  a  high-pass  or  low-pass filter with 3dB point
          frequency.  The filter can be either single-pole  (with
          -1),  or  double-pole (the default, or with -2).  width
          applies only to double-pole filters; the default is Q =
          0.707  and  gives  a Butterworth response.  The filters
          roll off at 6dB per pole per octave (20dB per pole  per
          decade).   The  double-pole  filters  are  described in
          detail in [1].

          These effects support the --plot global option.

          See also sinc for filters with a steeper roll-off.

     ladspa module [plugin] [argument...]
          Apply a LADSPA  [5]  (Linux  Audio  Developer's  Simple
          Plugin  API)  plugin.   Despite the name, LADSPA is not
          Linux-specific, and a wide range of effects  is  avail-
          able  as  LADSPA plugins, such as cmt [6] (the Computer
          Music Toolkit) and  Steve  Harris's  plugin  collection
          [7].  The first argument is the plugin module, the sec-
          ond the name of the plugin (a module can  contain  more
          than  one  plugin)  and any other arguments are for the
          control ports of the plugin. Missing arguments are sup-
          plied  by default values if possible. Only plugins with
          at most one audio input and one audio output  port  can
          be   used.    If   found,   the   environment  variable
          LADSPA_PATH will be used as search path for plugins.

     loudness [gain [reference]]
          Loudness control - similar to the gain effect, but pro-
          vides  equalisation for the human auditory system.  See
  for  a  detailed
          description  of  loudness.  The gain is adjusted by the
          given gain parameter (usually negative) and the  signal
          equalised according to ISO 226 w.r.t. a reference level
          of 65dB, though an alternative reference level  may  be
          given if the original audio has been equalised for some
          other optimal level.  A default gain of -10dB  is  used
          if a gain value is not given.

          See also the gain effect.

     lowpass [-1|-2] frequency[k] [width[q|o|h|k]]

sox               Last change: February 19, 2011               41

Sound eXchange                                             SoX(1)

          Apply  a  low-pass  filter.  See the description of the
          highpass effect for details.

     mcompand "attack1,decay1{,attack2,decay2}
          [gain [initial-volume-dB [delay]]]"  {crossover-freq[k]

          The  multi-band compander is similar to the single-band
          compander but the audio is  first  divided  into  bands
          using  Linkwitz-Riley  cross-over  filters  and a sepa-
          rately specifiable compander run on each band.  See the
          compand  effect  for  the definition of its parameters.
          Compand parameters are specified between double  quotes
          and  the  crossover frequency for that band is given by
          crossover-freq; these can be repeated to create  multi-
          ple bands.

          For example, the following (one long) command shows how
          multi-band companding is typically used in FM radio:

             play track1.wav gain -3 sinc 8000- 29 100 mcompand \
               "0.005,0.1 -47,-40,-34,-34,-17,-33" 100 \
               "0.003,0.05 -47,-40,-34,-34,-17,-33" 400 \
               "0.000625,0.0125 -47,-40,-34,-34,-15,-33" 1600 \
               "0.0001,0.025 -47,-40,-34,-34,-31,-31,-0,-30" 6400 \
               "0,0.025 -38,-31,-28,-28,-0,-25" \
               gain 15 highpass 22 highpass 22 sinc -n 255 -b 16 -17500 \
               gain 9 lowpass -1 17801

          The audio file is played  with  a  simulated  FM  radio
          sound  (or  broadcast  signal  condition if the lowpass
          filter at the end is skipped).  Note that the  pipeline
          is set up with US-style 75us pre-emphasis.

          See also compand for a single-band companding effect.

     mixer [ -l|-r|-f|-b|-1|-2|-3|-4|n{,n} ]
          Reduce  the  number  of  audio  channels  by  mixing or
          selecting channels, or increase the number of  channels
          by duplicating channels.  Note: this effect operates on
          the audio channels within the  SoX  effects  processing
          chain;  it  should  not  be confused with the -m global
          option (where multiple files  are  mix-combined  before
          entering the effects chain).

          When  reducing the number of channels it is possible to
          use the -l, -r, -f, -b, -1,  -2,  -3,  -4,  options  to
          select  only the left, right, front, back channel(s) or
          specific channel for the output  instead  of  averaging
          the channels.  The -l, and -r options will do averaging
          in quad-channel files so select the  exact  channel  to

sox               Last change: February 19, 2011               42

Sound eXchange                                             SoX(1)

          prevent this.

          The mixer effect can also be invoked with up to 16 num-
          bers, separated by commas, which specify the proportion
          (0  = 0% and 1 = 100%) of each input channel that is to
          be mixed into  each  output  channel.   In  two-channel
          mode,  4 numbers are given: l -> l, l -> r, r -> l, and
          r -> r, respectively.  In four-channel mode, the  first
          4  numbers give the proportions for the left-front out-
          put channel, as follows: lf -> lf, rf -> lf, lb ->  lf,
          and  rb  -> rf.  The next 4 give the right-front output
          in the same order, then left-back and right-back.

          It is also possible to use the 16 numbers to expand  or
          reduce  the  channel  count;  just specify 0 for unused

          Finally, certain reduced combination of numbers can  be
          specified  for  certain  input/output  channel combina-
         In Ch   Out Ch   Num   Mappings
           2       1       2    l -> l, r -> l
           2       2       1    adjust balance
           4       1       4    lf -> l, rf -> l, lb -> l, rb -> l
           4       2       2    lf -> l&rf -> r, lb -> l&rb -> r
           4       4       1    adjust balance
           4       4       2    front balance, back balance

          See also remix for a mixing  effect  that  handles  any
          number of channels.

     noiseprof [profile-file]
          Calculate  a  profile  of  the  audio  for use in noise
          reduction.  See the description of the noisered  effect
          for details.

     noisered [profile-file [amount]]
          Reduce  noise in the audio signal by profiling and fil-
          tering.  This effect is moderately effective at  remov-
          ing  consistent  background  noise such as hiss or hum.
          To use it, first run SoX with the noiseprof effect on a
          section of audio that ideally would contain silence but
          in fact contains noise - such  sections  are  typically
          found  at  the  beginning  or  the  end of a recording.
          noiseprof will write out a noise  profile  to  profile-
          file,  or  to  stdout  if  no profile-file or if `-' is
          given.  E.g.

             sox speech.wav -n trim 0 1.5 noiseprof speech.noise-profile

          To actually remove the noise, run SoX again, this  time
          with  the  noisered  effect; noisered will reduce noise

sox               Last change: February 19, 2011               43

Sound eXchange                                             SoX(1)

          according to a noise profile (which  was  generated  by
          noiseprof), from profile-file, or from stdin if no pro-
          file-file or if `-' is given.  E.g.

             sox speech.wav cleaned.wav noisered speech.noise-profile 0.3

          How much  noise  should  be  removed  is  specified  by
          amount-a  number between 0 and 1 with a default of 0.5.
          Higher numbers will remove more  noise  but  present  a
          greater likelihood of removing wanted components of the
          audio signal.  Before replacing an  original  recording
          with a noise-reduced version, experiment with different
          amount values to find the optimal one for  your  audio;
          use  headphones  to  check  that you are happy with the
          results, paying particular attention  to  quieter  sec-
          tions of the audio.

          On  most systems, the two stages - profiling and reduc-
          tion - can be combined using a pipe, e.g.

             sox noisy.wav -n trim 0 1 noiseprof | play noisy.wav noisered

     norm [dB-level]
          Normalise the audio.  norm is just an  alias  for  gain
          -n; see the gain effect for details.

          Note that norm's -i and -b options are deprecated (hav-
          ing been superseded by gain -en  and  gain  -B  respec-
          tively) and will be removed in a future release.

     oops Out  Of Phase Stereo effect.  Mixes stereo to twin-mono
          where each mono channel contains the difference between
          the  left and right stereo channels.  This is sometimes
          known as the `karaoke'  effect  as  it  often  has  the
          effect  of  removing  most  or all of the vocals from a

     overdrive [gain(20) [colour(20)]]
          Non linear distortion.  The colour  parameter  controls
          the  amount of even harmonic content in the over-driven

     pad { length[@position] }
          Pad the audio with silence, at the beginning, the  end,
          or any specified points through the audio.  Both length
          and position can specify a time or, if appended with an
          `s',  a  number  of  samples.   length is the amount of
          silence to insert and  position  the  position  in  the
          input  audio  stream at which to insert it.  Any number
          of lengths and positions  may  be  specified,  provided
          that a specified position is not less that the previous

sox               Last change: February 19, 2011               44

Sound eXchange                                             SoX(1)

          one.  position is  optional  for  the  first  and  last
          lengths  specified  and  if  omitted  correspond to the
          beginning and the end of the audio  respectively.   For
          example,  pad  1.5 1.5 adds 1.5 seconds of silence pad-
          ding at each end of the audio,  whilst  pad  4000s@3:00
          inserts  4000  samples  of  silence  3 minutes into the
          audio.  If silence is wanted only at  the  end  of  the
          audio,  specify  either  the  end position or specify a
          zero-length pad at the start.

          See also delay for an effect that can  add  silence  at
          the  beginning  of  the  audio  on a channel-by-channel

     phaser gain-in gain-out delay decay speed [-s|-t]
          Add a phasing effect to  the  audio.   See  [3]  for  a
          detailed description of phasing.

          delay/decay/speed  gives  the delay in milliseconds and
          the decay (relative to gain-in) with a modulation speed
          in  Hz.   The  modulation  is either sinusoidal (-s)  -
          preferable for multiple instruments, or triangular (-t)
          -  gives  single  instruments a sharper phasing effect.
          The decay should be less than 0.5  to  avoid  feedback,
          and  usually  no less than 0.1.  Gain-out is the volume
          of the output.

          For example:

             play snare.flac phaser 0.8 0.74 3 0.4 0.5 -t


             play snare.flac phaser 0.9 0.85 4 0.23 1.3 -s

          A popular sound:

             play snare.flac phaser 0.89 0.85 1 0.24 2 -t

          More severe:

             play snare.flac phaser 0.6 0.66 3 0.6 2 -t

     pitch [-q] shift [segment [search [overlap]]]
          Change the audio pitch (but not tempo).

          shift gives the pitch shift  as  positive  or  negative
          `cents'  (i.e.  100ths  of  a semitone).  See the tempo
          effect for a description of the other parameters.

          See also the speed and tempo effects.

sox               Last change: February 19, 2011               45

Sound eXchange                                             SoX(1)

     rate [-q|-l|-m|-h|-v] [override-options] RATE[k]
          Change the  audio  sampling  rate  (i.e.  resample  the
          audio)  to  any given RATE (even non-integer if this is
          supported by the output file format)  using  a  quality
          level defined as follows:
                   Quality   Band-  Rej dB   Typical Use
             -q     quick     n/a   ~=30 @   playback on
                                     Fs/4    ancient hardware
             -l      low      80%    100     playback on old
             -m    medium     95%    100     audio playback
             -h     high      95%    125     16-bit mastering
                                             (use with dither)
             -v   very high   95%    175     24-bit mastering

          where Band-width is the percentage of  the  audio  fre-
          quency  band  that is preserved and Rej dB is the level
          of noise rejection.  Increasing  levels  of  resampling
          quality  come  at  the expense of increasing amounts of
          time to process the audio.  If  no  quality  option  is
          given, the quality level used is `high'.

          The  `quick'  algorithm  uses  cubic interpolation; all
          others use band-limited interpolation.  By default, all
          algorithms   have   a   `linear'  phase  response;  for
          `medium', `high' and `very high', the phase response is
          configurable (see below).

          The  rate  effect  is invoked automatically if SoX's -r
          option specifies a rate that is different  to  that  of
          the  input  file(s).   Alternatively, if this effect is
          given explicitly, then SoX's  -r  option  need  not  be
          given.   For  example,  the  following two commands are

             sox input.wav -r 48k output.wav bass -3
             sox input.wav        output.wav bass -3 rate 48k

          though the second command is more flexible as it allows
          rate  options to be given, and allows the effects to be
          ordered arbitrarily.
                            *        *        *

          Warning: technically detailed discussion follows.

          The simple quality selection described  above  provides
          settings that satisfy the needs of the vast majority of
          resampling tasks.  Occasionally,  however,  it  may  be
          desirable to fine-tune the resampler's filter response;
          this  can  be  achieved  using   override options,   as

sox               Last change: February 19, 2011               46

Sound eXchange                                             SoX(1)

          detailed in the following table:
      -M/-I/-L     Phase response = minimum/intermediate/linear
      -s           Steep filter (band-width = 99%)
      -a           Allow aliasing/imaging above the pass-band
      -b 74-99.7   Any band-width %
      -p 0-100     Any phase response (0 = minimum, 25 = intermediate,
                   50 = linear, 100 = maximum)

          N.B.  Override options can not be used with the `quick'
          or `low' quality algorithms.

          All  resamplers  use  filters that can sometimes create
          `echo' (a.k.a.   `ringing')  artefacts  with  transient
          signals such as those that occur with `finger snaps' or
          other highly percussive  sounds.   Such  artefacts  are
          much  more  noticeable  to  the human ear if they occur
          before the transient (`pre-echo') than  if  they  occur
          after  it  (`post-echo').   Note  that frequency of any
          such artefacts is related to the smaller of the  origi-
          nal and new sampling rates but that if this is at least
          44.1kHz, then the artefacts will lie outside the  range
          of human hearing.

          A  phase  response  setting  may be used to control the
          distribution of any transient echo  between  `pre'  and
          `post':  with  minimum  phase, there is no pre-echo but
          the longest post-echo; with linear phase, pre and  post
          echo  are  in  equal  amounts (in signal terms, but not
          audibility  terms);  the  intermediate  phase   setting
          attempts  to  find  the  best compromise by selecting a
          small length (and  level)  of  pre-echo  and  a  medium
          lengthed post-echo.

          Minimum,  intermediate,  or  linear  phase  response is
          selected using the -M, -I, or -L option; a custom phase
          response  can be created with the -p option.  Note that
          phase responses between `linear' and `maximum' (greater
          than 50) are rarely useful.

          A resampler's band-width setting determines how much of
          the frequency content of the  original  signal  (w.r.t.
          the  original  sample rate when up-sampling, or the new
          sample rate when  down-sampling)  is  preserved  during
          conversion.   The  term `pass-band' is used to refer to
          all frequencies up to the band-width  point  (e.g.  for
          44.1kHz  sampling  rate, and a resampling band-width of
          95%, the  pass-band  represents  frequencies  from  0Hz
          (D.C.)  to  circa  21kHz).   Increasing the resampler's
          band-width results  in  a  slower  conversion  and  can
          increase transient echo artefacts (and vice versa).

          The  -s  `steep filter' option changes resampling band-

sox               Last change: February 19, 2011               47

Sound eXchange                                             SoX(1)

          width from the default 95% (based on the 3dB point), to
          99%.   The -b option allows the band-width to be set to
          any value in the range 74-99.7 %, but note  that  band-
          width  values  greater than 99% are not recommended for
          normal use as they can cause excessive transient  echo.

          If  the -a option is given, then aliasing/imaging above
          the pass-band is allowed.  For  example,  with  44.1kHz
          sampling rate, and a resampling band-width of 95%, this
          means that frequency content above 21kHz  can  be  dis-
          torted;  however,  since  this  is  above the pass-band
          (i.e.  above the highest frequency of interest/audibil-
          ity),  this  may  not  be  a  problem.  The benefits of
          allowing aliasing/imaging are reduced processing  time,
          and  reduced (by almost half) transient echo artefacts.
          Note that if this option is  given,  then  the  minimum
          band-width allowable with -b increases to 85%.


             sox input.wav -b 16 output.wav rate -s -a 44100 dither -s

          default  (high)  quality  resampling;  overrides: steep
          filter, allow aliasing; to 44.1kHz sample rate;  noise-
          shaped dither to 16-bit WAV file.

             sox input.wav -b 24 output.aiff rate -v -I -b 90 48k

          very  high  quality resampling; overrides: intermediate
          phase, band-width 90%; to 48k sample rate; store output
          to 24-bit AIFF file.
                            *        *        *

          The  pitch,  speed  and  tempo effects all use the rate
          effect at their core.

     remix [-a|-m|-p] <out-spec>
          out-spec  = in-spec{,in-spec} | 0
          in-spec   = [in-chan][-[in-chan2]][vol-spec]
          vol-spec  = p|i|v[volume]

          Select and mix input audio channels into  output  audio
          channels.   Each  output channel is specified, in turn,
          by a given out-spec: a list of contributing input chan-
          nels and volume specifications.

          Note  that  this  effect operates on the audio channels
          within the SoX effects processing chain; it should  not
          be  confused  with the -m global option (where multiple
          files are  mix-combined  before  entering  the  effects

sox               Last change: February 19, 2011               48

Sound eXchange                                             SoX(1)

          An out-spec contains comma-separated input channel-num-
          bers and hyphen-delimited channel-number ranges; alter-
          natively,  0  may  be  given  to create a silent output
          channel.  For example,

             sox input.wav output.wav remix 6 7 8 0

          creates an output file with four channels, where  chan-
          nels  1, 2, and 3 are copies of channels 6, 7, and 8 in
          the input file, and channel 4 is silent.  Whereas

             sox input.wav output.wav remix 1-3,7 3

          creates a (somewhat bizarre) stereo output  file  where
          the  left channel is a mix-down of input channels 1, 2,
          3, and 7, and the right channel  is  a  copy  of  input
          channel 3.

          Where  a  range  of  channels is specified, the channel
          numbers to  the  left  and  right  of  the  hyphen  are
          optional  and  default  to 1 and to the number of input
          channels respectively. Thus

             sox input.wav output.wav remix -

          performs a mix-down of all input channels to mono.

          By default, where an output channel is mixed from  mul-
          tiple  (n)  input  channels, each input channel will be
          scaled by a factor of /n.  Custom mixing volumes can be
          set  by  following  a  given  input channel or range of
          input channels with a vol-spec (volume  specification).
          This  is  one  of the letters p, i, or v, followed by a
          volume number, the meaning  of  which  depends  on  the
          given letter and is defined as follows:
              Letter   Volume number        Notes
                p      power adjust in dB   0 = no change
                i      power adjust in dB   As `p', but
                                            invert the audio
                v      voltage multiplier   1 = no change,
                                            0.5 ~= 6dB
                                            attenuation, 2
                                            ~= 6dB gain, -1
                                            = invert

          If  an out-spec includes at least one vol-spec then, by
          default, /n scaling is not applied to any  other  chan-
          nels  in the same out-spec (though may be in other out-
          specs).  The -a  (automatic)  option  however,  can  be
          given  to  retain  the  automatic scaling in this case.
          For example,

sox               Last change: February 19, 2011               49

Sound eXchange                                             SoX(1)

             sox input.wav output.wav remix 1,2 3,4v0.8

          results in channel level multipliers of 0.5,0.5  1,0.8,

             sox input.wav output.wav remix -a 1,2 3,4v0.8

          results   in   channel  level  multipliers  of  0.5,0.5

          The -m (manual) option disables  all  automatic  volume
          adjustments, so

             sox input.wav output.wav remix -m 1,2 3,4v0.8

          results in channel level multipliers of 1,1 1,0.8.

          The  volume  number  is optional and omitting it corre-
          sponds to no volume change; however, the only  case  in
          which  this  is  useful  is in conjunction with i.  For
          example, if input.wav is stereo, then

             sox input.wav output.wav remix 1,2i

          is a mono equivalent of the oops effect.

          If the -p option is given, then any automatic /n  scal-
          ing  is  replaced by /n (`power') scaling; this gives a
          louder mix but one that might occasionally clip.
                            *        *        *

          One use of the remix effect is to split an  audio  file
          into  a  set  of files, each containing one of the con-
          stituent channels (in order to perform subsequent  pro-
          cessing on individual audio channels).  Where more than
          a few channels are involved, a script such as the  fol-
          lowing (Bourne shell script) is useful:

          chans=`soxi -c "$1"`
          while [ $chans -ge 1 ]; do
             chans0=`printf %02i $chans`   # 2 digits hence up to 99 chans
             out=`echo "$1"|sed "s/\(.*\)\.\(.*\)/\1-$chans0.\2/"`
             sox "$1" "$out" remix $chans
             chans=`expr $chans - 1`

          If  a file input.wav containing six audio channels were
          given, the  script  would  produce  six  output  files:
          input-01.wav, input-02.wav, ..., input-06.wav.

          See also mixer and swap for similar effects.

sox               Last change: February 19, 2011               50

Sound eXchange                                             SoX(1)

     repeat count
          Repeat  the  entire audio count times.  Requires tempo-
          rary file space to store  the  audio  to  be  repeated.
          Note  that repeating once yields two copies: the origi-
          nal audio and the repeated audio.

     reverb [-w|--wet-only] [reverberance (50%) [HF-damping (50%)
          [room-scale (100%) [stereo-depth (100%)
          [pre-delay (0ms) [wet-gain (0dB)]]]]]]

          Add  reverberation  to  the  audio using the `freeverb'
          algorithm.  A reverberation effect is sometimes  desir-
          able for concert halls that are too small or contain so
          many people that the  hall's  natural  reverberance  is
          diminished.   Applying  a small amount of stereo reverb
          to a (dry) mono signal will usually make it sound  more
          natural.   See [3] for a detailed description of rever-

          Note that this effect increases both the volume and the
          length  of  the  audio, so to prevent clipping in these
          domains, a typical invocation might be:

             play dry.wav gain -3 pad 0 3 reverb

          The -w option can be given to  select  only  the  `wet'
          signal, thus allowing it to be processed further, inde-
          pendently of the `dry' signal.  E.g.

             play -m voice.wav "|sox voice.wav -p reverse reverb -w reverse"

          for a reverse reverb effect.

          Reverse the audio completely.  Requires temporary  file
          space to store the audio to be reversed.

     riaa Apply  RIAA  vinyl playback equalisation.  The sampling
          rate must be one of: 44.1, 48, 88.2, 96 kHz.

          This effect supports the --plot global option.

     silence [-l] above-periods [duration threshold[d|%]
          [below-periods duration threshold[d|%]]

          Removes silence from the beginning, middle, or  end  of
          the  audio.   `Silence'  is  determined  by a specified

          The above-periods value is used to  indicate  if  audio
          should  be  trimmed  at  the  beginning of the audio. A
          value of zero indicates no silence  should  be  trimmed

sox               Last change: February 19, 2011               51

Sound eXchange                                             SoX(1)

          from  the beginning. When specifying an non-zero above-
          periods, it trims audio up until it finds  non-silence.
          Normally, when trimming silence from beginning of audio
          the above-periods will be 1 but it can be increased  to
          higher  values to trim all audio up to a specific count
          of non-silence periods. For  example,  if  you  had  an
          audio file with two songs that each contained 2 seconds
          of silence before the song, you could specify an above-
          period  of  2 to strip out both silence periods and the
          first song.

          When above-periods is non-zero, you must also specify a
          duration and threshold. Duration indications the amount
          of time that non-silence must  be  detected  before  it
          stops trimming audio. By increasing the duration, burst
          of noise can be treated as silence and trimmed off.

          Threshold is used to indicate  what  sample  value  you
          should treat as silence.  For digital audio, a value of
          0 may be fine but for audio recorded from  analog,  you
          may  wish  to  increase  the value to account for back-
          ground noise.

          When optionally trimming silence from the  end  of  the
          audio,  you  specify  a  below-periods  count.  In this
          case, below-period means  to  remove  all  audio  after
          silence  is detected.  Normally, this will be a value 1
          of but it can be increased  to  skip  over  periods  of
          silence  that  are  wanted.  For example, if you have a
          song with 2 seconds of silence in the middle and 2 sec-
          ond  at  the end, you could set below-period to a value
          of 2 to skip over the silence  in  the  middle  of  the

          For  below-periods,  duration  specifies  a  period  of
          silence that must exist before audio is not copied  any
          more.  By specifying a higher duration, silence that is
          wanted can be left in the audio.  For example,  if  you
          have a song with an expected 1 second of silence in the
          middle and 2 seconds of silence at the end, a  duration
          of  2  seconds  could  be  used to skip over the middle

          Unfortunately, you must know the length of the  silence
          at the end of your audio file to trim off silence reli-
          ably.  A work around is to use the  silence  effect  in
          combination  with the reverse effect.  By first revers-
          ing the audio, you can use the above-periods  to  reli-
          ably  trim  all audio from what looks like the front of
          the file.  Then reverse the file again to get  back  to

sox               Last change: February 19, 2011               52

Sound eXchange                                             SoX(1)

          To  remove silence from the middle of a file, specify a
          below-periods that is negative.   This  value  is  then
          treated  as  a positive value and is also used to indi-
          cate the effect should restart processing as  specified
          by  the  above-periods, making it suitable for removing
          periods of silence in the middle of the audio.

          The option -l  indicates  that  below-periods  duration
          length  of audio should be left intact at the beginning
          of each period of silence.  For example, if you want to
          remove  long  pauses  between  words but do not want to
          remove the pauses completely.

          The period counts are in  units  of  samples.  Duration
          counts  may  be  in the format of hh:mm:ss.frac, or the
          exact count of samples.  Threshold numbers may be  suf-
          fixed with d to indicate the value is in decibels, or %
          to indicate a percentage of maximum value of the sample
          value (0% specifies pure digital silence).

          The following example shows how this effect can be used
          to start a recording that does not contain the delay at
          the  start  which  usually occurs between `pressing the
          record button' and the start of the performance:

             rec parameters filename other-effects silence 1 5 2%

qHP][-freqLP [-t tbw|-n taps]]
     sinc [-a att|-b beta] [-p phase|-M|-I|-L] [-t tbw|-n taps]
          Apply a sinc kaiser-windowed low-pass, high-pass, band-
          pass,  or band-reject filter to the signal.  The freqHP
          and freqLP parameters give the frequencies of  the  6dB
          points  of  a high-pass and low-pass filter that may be
          invoked individually, or together.  If both are  given,
          then freqHP < freqLP creates a band-pass filter, freqHP
          > freqLP creates a band-reject filter.

          The default stop-band attenuation of 120dB can be over-
          ridden with -a; alternatively, the kaiser-window `beta'
          parameter can be given directly with -b.

          The default transition band-width of 5%  of  the  total
          band  can  be  overridden  with  -t (and tbw in Hertz);
          alternatively, the number of filter taps can  be  given
          directly with -n.

          If  both  freqHP  and freqLP are given, then a -t or -n
          option given to the left of the frequencies applies  to
          both  frequencies;  one  of  these options given to the
          right of the frequencies applies only to freqLP.

sox               Last change: February 19, 2011               53

Sound eXchange                                             SoX(1)

          The -p, -M, -I, and -L  options  control  the  filter's
          phase response; see the rate effect for details.

          This effect supports the --plot global option.

     spectrogram [options]
          Create  a spectrogram of the audio; the audio is passed
          unmodified through  the  SoX  processing  chain.   This
          effect is optional - type sox --help and check the list
          of supported effects to see if it has been included.

          The spectrogram  is  rendered  in  a  Portable  Network
          Graphic  (PNG) file, and shows time in the X-axis, fre-
          quency in the Y-axis, and audio signal magnitude in the
          Z-axis.   Z-axis  values  are represented by the colour
          (or optionally the intensity) of the pixels in the  X-Y
          plane.   If the audio signal contains multiple channels
          then these are shown from top to bottom  starting  from
          channel 1 (which is the left channel for stereo audio).

          For example, if `my.wav' is a stereo file, then with

             sox my.wav -n spectrogram

          a spectrogram of the entire file will be created in the
          file `spectrogram.png'.  More often though, analysis of
          a smaller portion of the audio is required; e.g. with

             sox my.wav -n remix 2 trim 20 30 spectrogram

          the spectrogram shows information only from the  second
          (right)  channel, and of thirty seconds of audio start-
          ing from twenty seconds in.  To analyse a small portion
          of  the  frequency domain, the rate effect may be used,

             sox my.wav -n rate 6k spectrogram

          allows detailed analysis  of  frequencies  up  to  3kHz
          (half  the sampling rate) i.e. where the human auditory
          system is most sensitive.  With

             sox my.wav -n trim 0 10 spectrogram -x 600 -y 200 -z 100

          the given options control the size of the spectrogram's
          X,  Y  &  Z axes (in this case, the spectrogram area of
          the produced image will be 600 by 200  pixels  in  size
          and  the  Z-axis  range will be 100 dB).  Note that the
          produced image includes axes legends etc. and  so  will
          be a little larger than the specified spectrogram size.
          In this example:

sox               Last change: February 19, 2011               54

Sound eXchange                                             SoX(1)

             sox -n -n synth 6 tri 10k:14k spectrogram -z 100 -w kaiser

          an  analysis  `window'  with  high  dynamic  range   is
          selected  to  best  display  the spectrogram of a swept
          triangular wave.  For a smilar example, append the fol-
          lowing to the `chime' command in the description of the
          delay effect (above):

             rate 2k spectrogram -X 200 -Z -10 -w kaiser

          Options are also avaliable to  control  the  appearance
          (colour-set,  brightness,  contrast, etc.) and filename
          of the spectrogram; e.g. with

             sox my.wav -n spectrogram -m -l -o print.png

          a spectrogram is created suitable  for  printing  on  a
          `black and white' printer.


          -x num
               Change  the  (maximum) width (X-axis) of the spec-
               trogram from its default value of 800 pixels to  a
               given  number  between  100 and 5000.  See also -X
               and -d.

          -X num
               X-axis pixels/second; the default  is  auto-calcu-
               lated  to fit the given or known audio duration to
               the X-axis size, or 100 otherwise.   If  given  in
               conjunction with -d, this option affects the width
               of the  spectrogram;  otherwise,  it  affects  the
               duration  of  the  spectrogram.  num can be from 1
               (low time resolution) to 5000 (high  time  resolu-
               tion)  and need not be an integer.  SoX may make a
               slight adjustment to the given number for process-
               ing  quantisation  reasons; if so, SoX will report
               the actual number  used  (viewable  when  the  SoX
               global  option  -V is in effect).  See also -x and

          -y num
               Sets the Y-axis size in pixels (per channel); this
               is  the  number  of  frequency  `bins' used in the
               Fourier analysis that  produces  the  spectrogram.
               N.B.  it can be slow to produce the spectrogram if
               this number is not one more than a  power  of  two
               (e.g.  129).  By default the Y-axis size is chosen
               automatically (depending on the  number  of  chan-
               nels).   See  -Y  for  alternative  way of setting
               spectrogram height.

sox               Last change: February 19, 2011               55

Sound eXchange                                             SoX(1)

          -Y num
               Sets the  target  total  height  of  the  spectro-
               gram(s).   The default value is 550 pixels.  Using
               this option (and by default), SoX  will  choose  a
               height for individual spectrogram channels that is
               one more than a power of two, so the actual  total
               height  may  fall short of the given number.  How-
               ever, there is also a minimum height  per  channel
               so  if  there are many channels, the number may be
               exceeded.  See -y for alternative way  of  setting
               spectrogram height.

          -z num
               Z-axis  (colour)  range  in dB, default 120.  This
               sets the dynamic-range of the  spectrogram  to  be
               -num dBFS  to  0 dBFS.   Num  may range from 20 to
               180.    Decreasing    dynamic-range    effectively
               increases  the  `contrast' of the spectrogram dis-
               play, and vice versa.

          -Z num
               Sets the upper limit of the  Z-axis  in  dBFS.   A
               negative  num  effectively  increases the `bright-
               ness' of the spectrogram display, and vice  versa.

          -q num
               Sets  the  Z-axis quantisation, i.e. the number of
               different colours (or  intensities)  in  which  to
               render  Z-axis  values.   A  small number (e.g. 4)
               will give a `poster'-like effect making it  easier
               to  discern  magnitude  bands  of  similar  level.
               Small numbers also usually  result  in  small  PNG
               files.   The  number given specifies the number of
               colours  to  use  inside  the  Z-axis  range;  two
               colours  are  reserved  to  represent out-of-range

          -w name
               Window: Hann (default), Hamming, Bartlett, Rectan-
               gular  or  Kaiser.   The  spectrogram  is produced
               using the Discrete Fourier Transform  (DFT)  algo-
               rithm.   A significant parameter to this algorithm
               is the choice of `window function'.   By  default,
               SoX  uses the Hann window which has good all-round
               frequency-resolution and dynamic-range properties.
               For   better   frequency   resolution  (but  lower
               dynamic-range),  select  a  Hamming  window;   for
               higher dynamic-range (but poorer frequency-resolu-
               tion), select a Kaiser window.  Bartlett and Rect-
               angular windows are also available.

          -W num

sox               Last change: February 19, 2011               56

Sound eXchange                                             SoX(1)

               Window  adjustment parameter.  This can be used to
               make small adjustments to the Kaiser window shape.
               A  positive  number  (up  to  ten)  increases  its
               dynamic range, a negative number decreases it.

          -s   Allow slack overlapping of DFT windows.  This can,
               in  some  cases, increase image sharpness and give
               greater adherence to the  -x  value,  but  at  the
               expense of a little spectral loss.

          -m   Creates  a  monochrome spectrogram (the default is

          -h   Selects a  high-colour  palette  -  less  visually
               pleasing  than  the default colour palette, but it
               may make it easier to differentiate different lev-
               els.   If  this option is used in conjunction with
               -m, the result will be a hybrid  monochrome/colour

          -p num
               Permute the colours in a colour or hybrid palette.
               The num parameter, from  1  (the  default)  to  6,
               selects the permutation.

          -l   Creates  a  `printer  friendly' spectrogram with a
               light background (the default  has  a  dark  back-

          -a   Suppress  the  display of the axis lines.  This is
               sometimes useful in helping to  discern  artefacts
               at the spectrogram edges.

          -r   Raw  spectrogram: suppress the display of axes and

          -A   Selects an alternative, fixed colour-set.  This is
               provided  only for compatibility with spectrograms
               produced by another package.  It should  not  nor-
               mally  be used as it has some problems, not least,
               a lack of differentiation at the bottom end  which
               results in masking of low-level artefacts.

          -t text
               Set  the  image  title - text to display above the

          -c text
               Set (or clear) the image comment - text to display
               below and to the left of the spectrogram.

          -o text

sox               Last change: February 19, 2011               57

Sound eXchange                                             SoX(1)

               Name  of  the spectrogram output PNG file, default

          Advanced Options:
          In order to process a smaller section of audio  without
          affecting  other  effects  or the output signal (unlike
          when the trim effect is used),  the  following  options
          may be used.

          -d duration
               This  option  sets the X-axis resolution such that
               audio with the given duration ([[HH:]MM:]SS)  fits
               the selected (or default) X-axis width.  For exam-

                  sox input.mp3 output.wav -n spectrogram -d 1:00 stats

               creates a spectrogram showing the first minute  of
               the audio, whilst

               the  stats  effect  is applied to the entire audio

               See also -X for an alternative way of setting  the
               X-axis resolution.

          -S time
               Start  the  spectrogram  at the given point in the
               audio stream.  For example

                  sox input.aiff output.wav spectrogram -S 1:00

               creates a spectrogram showing all  but  the  first
               minute  of  the  audio  (the  output file however,
               receives the entire audio stream).

          For the ability to perform off-line processing of spec-
          tral data, see the stat effect.

     speed factor[c]
          Adjust  the  audio  speed  (pitch  and tempo together).
          factor is either the ratio of the new speed to the  old
          speed:  greater  than  1  speeds  up, less than 1 slows
          down, or, if appended with the letter `c',  the  number
          of cents (i.e. 100ths of a semitone) by which the pitch
          (and  tempo)  should  be  adjusted:  greater   than   0
          increases, less than 0 decreases.

          By default, the speed change is performed by resampling
          with the rate effect using its  default  quality/speed.
          For higher quality or higher speed resampling, in addi-
          tion to the speed effect, specify the rate effect  with

sox               Last change: February 19, 2011               58

Sound eXchange                                             SoX(1)

          the desired quality option.

          See also the pitch and tempo effects.

     splice  [-h|-t|-q] { position[,excess[,leeway]] }
          Splice  together  audio sections.  This effect provides
          two things over simple audio concatenation: a  (usually
          short)  cross-fade  is  applied at the join, and a wave
          similarity comparison is made  to  help  determine  the
          best place at which to make the join.

          One of the options -h, -t, or -q may be given to select
          the fade envelope as triangular  (a.k.a.  linear)  (the
          default),  half-cosine  wave,  or  quarter-cosine  wave
            Type   Audio          Fade level       Transitions
             t     correlated     constant gain    abrupt
             h     correlated     constant gain    smooth
             q     uncorrelated   constant power   smooth

          To perform a splice,  first  use  the  trim  effect  to
          select  the  audio  sections to be joined together.  As
          when performing a tape splice, the end of  the  section
          to  be  spliced  onto  should  be  trimmed with a small
          excess (default 0.005 seconds) of audio after the ideal
          joining  point.   The beginning of the audio section to
          splice on  should  be  trimmed  with  the  same  excess
          (before  the  ideal  joining point), plus an additional
          leeway (default 0.005 seconds).   SoX  should  then  be
          invoked  with the two audio sections as input files and
          the splice effect given with the position at  which  to
          perform  the splice - this is length of the first audio
          section (including the excess).

          For example, a long song begins with two  verses  which
          start  (as  determined  e.g.  by using the play command
          with the trim (start) effect)  at  times  0:30.125  and
          1:03.432.   The  following  commands  cut out the first

             sox too-long.wav part1.wav trim 0 30.130

          (5 ms excess, after the first verse starts)

             sox too-long.wav part2.wav trim 1:03.422

          (5 ms excess plus 5 ms leeway, before the second  verse

             sox part1.wav part2.wav just-right.wav splice 30.130

          For another example, the SoX command

sox               Last change: February 19, 2011               59

Sound eXchange                                             SoX(1)

             play "|sox -n -p synth 1 sin %1" "|sox -n -p synth 1 sin %3"

          generates  and  plays  two  notes, but there is a nasty
          click at the transition; the click can  be  removed  by
          splicing  instead  of  concatenating the audio, i.e. by
          appending splice 1  to  the  command.  (Clicks  at  the
          beginning  and  end of the audio can be removed by pre-
          ceding the splice effect with fade q .01 2 .01).

          Provided  your  arithmetic  is  good  enough,  multiple
          splices  can  be performed with a single splice invoca-
          tion.  For example:

          # Audio Copy and Paste Over
          # acpo infile copy-start copy-stop paste-over-start outfile
          # All times measured in samples.
          rate=`soxi -r "$1"`
          e=`expr $rate '*' 5 / 1000`  # Using default excess
          l=$e                         # and leeway.
          sox "$1" piece.wav trim `expr $2 - $e - $l`s \
             `expr $3 - $2 + $e + $l + $e`s
          sox "$1" part1.wav trim 0 `expr $4 + $e`s
          sox "$1" part2.wav trim `expr $4 + $3 - $2 - $e - $l`s
          sox part1.wav piece.wav part2.wav "$5" splice \
             `expr $4 + $e`s \
             `expr $4 + $e + $3 - $2 + $e + $l + $e`s

          In the above Bourne shell script, two splices are  used
          to `copy and paste' audio.
                            *        *        *

          It  is also possible to use this effect to perform gen-
          eral cross-fades, e.g. to  join  two  songs.   In  this
          case,  excess  would typically be an number of seconds,
          the -q option would typically be given  (to  select  an
          `equal  power'  cross-fade),  and leeway should be zero
          (which is the default if -q is given).  For example, if
          f1.wav  and  f2.wav  are audio files to be cross-faded,

             sox f1.wav f2.wav out.wav splice -q $(soxi -D f1.wav),3

          cross-fades the files where the point of equal loudness
          is  3  seconds before the end of f1.wav, i.e. the total
          length of the cross-fade is 2 x 3 =  6  seconds  (Note:
          the $(...) notation is POSIX shell).

     stat [-s scale] [-rms] [-freq] [-v] [-d]
          Display  time and frequency domain statistical informa-
          tion about  the  audio.   Audio  is  passed  unmodified
          through the SoX processing chain.

sox               Last change: February 19, 2011               60

Sound eXchange                                             SoX(1)

          The  information  is  output  to  the  `standard error'
          (stderr) stream and is calculated, where n is the dura-
          tion  of the audio in samples, c is the number of audio
          channels, r is the audio sample rate, and xk represents
          the  PCM  value  (in  the range -1 to +1 by default) of
          each successive sample in the audio, as follows:
the      dcshift
the  vol  effect
which would make
the   audio   as
loud as possible
without    clip-
ping.  Note: See
the   discussion
on      Clipping
above  for  rea-
sons why  it  is
rarely   a  good
idea actually to
do this.
        Samples read        nxc
        Length (seconds)    nr
        Scaled by                                   See -s below.
        Maximum amplitude   max(xk)                 The maximum sam-
                                                    ple value in the
                                                    audio;   usually
                                                    this  will  be a
                                                    positive number.
        Minimum amplitude   min(xk)                 The minimum sam-
                                                    ple value in the
                                                    audio;   usually
                                                    this will  be  a
                                                    negative number.
        Midline amplitude   1/2min(xk)+1/2max(xk)
        Mean norm           /n|xk|                  The  average  of
                                                    the     absolute
                                                    value  of   each
                                                    sample   in  the
        Mean amplitude      /nxk                    The  average  of
                                                    each  sample  in
                                                    the  audio.   If
                                                    this  figure  is
                                                    non-zero,   then
                                                    it indicates the
                                                    presence  of   a
                                                    D.C.      offset
                                                    (which could  be
                                                    removed    using

sox               Last change: February 19, 2011               61

Sound eXchange                                             SoX(1)

        RMS amplitude       (/nxk)                  The  level  of a
                                                    D.C. signal that
                                                    would  have  the
                                                    same  power   as
                                                    the      audio's
                                                    average power.
        Maximum delta       max(|xk-xk-1|)
        Minimum delta       min(|xk-xk-1|)
        Mean delta          /n-1|xk-xk-1|
        RMS delta           (/n-1(xk-xk-1))
        Rough frequency                             In Hz.
        Volume Adjustment                           The parameter to

          Note that the delta measurements are not applicable for
          multi-channel audio.

          The -s option can be used to scale the input data by  a
          given factor.  The default value of scale is 2147483647
          (i.e. the maximum value of a  32-bit  signed  integer).
          Internal  effects always work with signed long PCM data
          and so the value should relate to this fact.

          The -rms option will convert all output average  values
          to `root mean square' format.

          The  -v  option  displays  only the `Volume Adjustment'

          The -freq option calculates the input's power  spectrum
          (4096  point  DFT)  instead  of  the  statistics listed
          above.  This should only be used with a single  channel
          audio file.

          The  -d option displays a hex dump of the 32-bit signed
          PCM data audio  in  SoX's  internal  buffer.   This  is
          mainly  used  to  help  track down endian problems that
          sometimes occur in cross-platform versions of SoX.

          See also the stats effect.

     stats [-b bits|-x bits|-s scale] [-w window-time]
          Display time domain statistical information  about  the
          audio  channels; audio is passed unmodified through the
          SoX processing chain.  Statistics  are  calculated  and
          displayed for each audio channel and, where applicable,
          an overall figure is also given.

          For example, for a typical well-mastered  stereo  music
                              Overall     Left      Right
                 DC offset   0.000803 -0.000391  0.000803

sox               Last change: February 19, 2011               62

Sound eXchange                                             SoX(1)

                 Min level  -0.750977 -0.750977 -0.653412
                 Max level   0.708801  0.708801  0.653534
                 Pk lev dB      -2.49     -2.49     -3.69
                 RMS lev dB    -19.41    -19.13    -19.71
                 RMS Pk dB     -13.82    -13.82    -14.38
                 RMS Tr dB     -85.25    -85.25    -82.66
                 Crest factor       -      6.79      6.32
                 Flat factor     0.00      0.00      0.00
                 Pk count           2         2         2
                 Bit-depth      16/16     16/16     16/16
                 Num samples    7.72M
                 Length s     174.973
                 Scale max   1.000000
                 Window s       0.050

          DC offset,  Min level,  and  Max level  are  shown,  by
          default, in the range +-1.  If the -b (bits) options is
          given,  then these three measurements will be scaled to
          a signed integer with the given  number  of  bits;  for
          example,  for  16  bits,  the  scale would be -32768 to
          +32767.  The -x option  behaves  the  same  way  as  -b
          except  that the signed integer values are displayed in
          hexadecimal.  The -s option scales the  three  measure-
          ments by a given floating-point number.

          Pk lev dB  and  RMS lev dB  are  standard  peak and RMS
          level measured in dBFS.  RMS Pk dB  and  RMS Tr dB  are
          peak  and  trough  values for RMS level measured over a
          short window (default 50ms).

          Crest factor is the standard ratio of peak to RMS level
          (note: not in dB).

          Flat factor is a measure of the flatness (i.e. consecu-
          tive samples with the same value) of the signal at  its
          peak  levels  (i.e.  either  Min level,  or Max level).
          Pk count is the number of occasions (not the number  of
          samples)  that the signal attained either Min level, or
          Max level.

          The right-hand Bit-depth figure is the standard defini-
          tion  of  bit-depth i.e. bits less significant than the
          given number are fixed at zero.  The  left-hand  figure
          is  the  number of most significant bits that are fixed
          at zero (or one for negative numbers)  subtracted  from
          the   right-hand   figure  (the  number  subtracted  is
          directly related to Pk lev dB).

          For multi-channel audio, an overall figure for each  of
          the  above  measurements  is given and derived from the
          channel figures as follows: DC offset:  maximum  magni-
          tude;   Max level,   Pk lev dB,  RMS Pk dB,  Bit-depth:

sox               Last change: February 19, 2011               63

Sound eXchange                                             SoX(1)

          maximum;  Min level,  RMS Tr dB:  minimum;  RMS lev dB,
          Flat factor,   Pk count:   average;  Crest factor:  not

          Length s is the duration in seconds of the  audio,  and
          Num samples  is  equal to the sample-rate multiplied by
          Length.  Scale Max is the scaling applied to the  first
          three  measurements;  specifically,  it  is the maximum
          value that could apply to Max level.  Window s  is  the
          length  of  the window used for the peak and trough RMS

          See also the stat effect.

     swap Swap stereo channels.  See also  remix  for  an  effect
          that  allows  arbitrary  channel selection and ordering
          (and mixing).

     stretch factor [window fade shift fading]
          Change the audio duration (but not  its  pitch).   This
          effect  is  broadly equivalent to the tempo effect with
          (factor inverted and) search set to zero,  so  in  gen-
          eral,   its  results  are  comparatively  poor;  it  is
          retained as it  can  sometimes  out-perform  tempo  for
          small factors.

          factor of stretching: >1 lengthen, <1 shorten duration.
          window size is in  ms.   Default  is  20ms.   The  fade
          option,  can be `lin'.  shift ratio, in [0 1].  Default
          depends  on  stretch  factor.  1  to  shorten,  0.8  to
          lengthen.  The fading ratio, in [0 0.5].  The amount of
          a fade's default depends on factor and shift.

          See also the tempo effect.

bine] [[%]freq[k][:|+|/|-[%]freq2[k]]] [off [ph [p1 [p2 [p3]]]]]}
     synth [-j KEY] [-n] [len [off [ph [p1 [p2 [p3]]]]]] {[type]
          This effect can be used to generate fixed or swept fre-
          quency audio tones with various wave shapes, or to gen-
          erate wide-band noise of various  `colours'.   Multiple
          synth  effects  can be cascaded to produce more complex
          waveforms; at each  stage  it  is  possible  to  choose
          whether  the  generated waveform will be mixed with, or
          modulated onto the  output  from  the  previous  stage.
          Audio  for  each  channel in a multi-channel audio file
          can be synthesised independently.

          Though this effect is used to generate audio, an  input
          file  must still be given, the characteristics of which
          will be used to set the synthesised audio  length,  the
          number  of  channels,  and  the sampling rate; however,

sox               Last change: February 19, 2011               64

Sound eXchange                                             SoX(1)

          since the input file's audio is not normally needed,  a
          `null  file'  (with the special name -n) is often given
          instead (and the length specified  as  a  parameter  to
          synth  or by another given effect that can has an asso-
          ciated length).

          For example, the following produces a 3 second,  48kHz,
          audio  file  containing  a  sine-wave swept from 300 to
          3300 Hz:

             sox -n output.wav synth 3 sine 300-3300

          and this produces an 8 kHz version:

             sox -r 8000 -n output.wav synth 3 sine 300-3300

          Multiple channels can be synthesised by specifying  the
          set  of parameters shown between braces multiple times;
          the following puts the swept tone in the  left  channel
          and adds `brown' noise in the right:

             sox -n output.wav synth 3 sine 300-3300 brownnoise

          The  following  example shows how two synth effects can
          be cascaded to create a more complex waveform:

             play -n synth 0.5 sine 200-500 synth 0.5 sine fmod 700-100

          Frequencies can also  be  given  in  `scientific'  note
          notation, or, by prefixing a `%' character, as a number
          of semitones relative  to  `middle  A'  (440 Hz).   For
          example,  the  following  could  be used to help tune a
          guitar's low `E' string:

             play -n synth 4 pluck %-29

          or with a (Bourne shell) loop, the whole guitar:

             for n in E2 A2 D3 G3 B3 E4; do
               play -n synth 4 pluck $n repeat 2; done

          See the delay effect (above) and the reference to  `SoX
          scripting examples' (below) for more synth examples.

          N.B.   This  effect  generates  audio at maximum volume
          (0dBFS), which means that there is  a  high  chance  of
          clipping  when using the audio subsequently, so in many
          cases, you will want to follow  this  effect  with  the
          gain  effect  to prevent this from happening. (See also
          Clipping above.)  Note  that,  by  default,  the  synth
          effect  incorporates  the functionality of gain -h (see
          the gain effect for details); synth's -n option may  be

sox               Last change: February 19, 2011               65

Sound eXchange                                             SoX(1)

          given to disable this behaviour.

          A detailed description of each synth parameter follows:

          len is the length of audio to synthesise expressed as a
          time   or   as  a  number  of  samples;  0=inputlength,

          The  format  for  specifying   lengths   in   time   is
          hh:mm:ss.frac.  The format for specifying sample counts
          is the number of samples with the letter  `s'  appended
          to it.

          type  is  one  of  sine,  square,  triangle,  sawtooth,
          trapezium,  exp,  [white]noise,  tpdfnoise   pinknoise,
          brownnoise, pluck; default=sine.

          combine  is one of create, mix, amod (amplitude modula-
          tion), fmod (frequency modulation); default=create.

          freq/freq2 are the frequencies at the beginning/end  of
          synthesis  in  Hz  or,  if preceded with `%', semitones
          relative to  A  (440 Hz);  alternatively,  `scientific'
          note  notation (e.g. E2) may be used.  The default fre-
          quency is 440Hz.  By default, the tuning used with  the
          note  notations  is  `equal  temperament';  the  -j KEY
          option selects `just intonation', where KEY is an inte-
          ger  number of semitones relative to A (so for example,
          -9 or 3 selects the key of C), or a note in  scientific

          If  freq2  is given, then len must also have been given
          and the generated tone will be swept between the  given
          frequencies.   The  two given frequencies must be sepa-
          rated by one of the characters `:', `+', `/',  or  `-'.
          This character is used to specify the sweep function as

          :    Linear: the tone will change by a fixed number  of
               hertz per second.

          +    Square:  a second-order function is used to change
               the tone.

          /    Exponential: the tone will change by a fixed  num-
               ber of semitones per second.

          -    Exponential:  as  `/',  but  initial  phase always
               zero, and stepped (less smooth) frequency changes.

          Not used for noise.

sox               Last change: February 19, 2011               66

Sound eXchange                                             SoX(1)

          off  is  the bias (DC-offset) of the signal in percent;

          ph is  the  phase  shift  in  percentage  of  1  cycle;
          default=0.  Not used for noise.

          p1  is  the  percentage  of  each  cycle  that  is `on'
          (square),  or  `rising'  (triangle,  exp,   trapezium);
          default=50 (square, triangle, exp), default=10 (trapez-
          ium), or sustain (pluck); default=40.

          p2 (trapezium): the percentage through  each  cycle  at
          which  `falling' begins; default=50. exp: the amplitude
          in multiples of 2dB;  default=50,  or  tone-1  (pluck);

          p3  (trapezium):  the  percentage through each cycle at
          which `falling' ends; default=60,  or  tone-2  (pluck);

     tempo [-q] [-m|-s|-l] factor [segment [search [overlap]]]
          Change the audio playback speed but not its pitch. This
          effect uses the WSOLA algorithm. The audio  is  chopped
          up  into  segments  which  are then shifted in the time
          domain and overlapped  (cross-faded)  at  points  where
          their  waveforms are most similar as determined by mea-
          surement of `least squares'.

          By default, linear searches are used to find  the  best
          overlapping  points.  If  the  optional -q parameter is
          given, tree searches are used instead. This  makes  the
          effect  work more quickly, but the result may not sound
          as good. However, if you must  improve  the  processing
          speed,  this  generally  reduces the sound quality less
          than reducing the search or overlap values.

          The -m option is used to  optimize  default  values  of
          segment, search and overlap for music processing.

          The  -s  option  is  used to optimize default values of
          segment, search and overlap for speech processing.

          The -l option is used to  optimize  default  values  of
          segment,  search  and  overlap  for `linear' processing
          that tends to cause more noticeable distortion but  may
          be useful when factor is close to 1.

          If  -m,  -s,  or  -l is specified, the default value of
          segment will  be  calculated  based  on  factor,  while
          default search and overlap values are based on segment.
          Any values you provide  still  override  these  default

sox               Last change: February 19, 2011               67

Sound eXchange                                             SoX(1)

          factor  gives  the ratio of new tempo to the old tempo,
          so e.g. 1.1 speeds up the tempo by 10%, and  0.9  slows
          it down by 10%.

          The  optional segment parameter selects the algorithm's
          segment size in milliseconds.  If no  other  flags  are
          specified,  the  default  value  is 82 and is typically
          suited to making small changes to the tempo  of  music.
          For larger changes (e.g. a factor of 2), 41 ms may give
          a better result.  The -m, -s, and -l flags  will  cause
          the  segment default to be automatically adjusted based
          on factor.  For example using -s (for  speech)  with  a
          tempo of 1.25 will calculate a default segment value of

          The optional search parameter gives the audio length in
          milliseconds  over  which the algorithm will search for
          overlapping points.  If no other flags  are  specified,
          the  default  value  is  14.68.  Larger values use more
          processing time and  may  or  may  not  produce  better
          results.  A practical maximum is half the value of seg-
          ment. Search can be reduced to cut processing  time  at
          the  risk  of degrading output quality. The -m, -s, and
          -l flags will cause the search default to be  automati-
          cally adjusted based on segment.

          The  optional overlap parameter gives the segment over-
          lap length in milliseconds.  Default value is  12,  but
          -m,  -s, or -l flags automatically adjust overlap based
          on segment size. Increasing overlap increases  process-
          ing  time and may increase quality. A practical maximum
          for overlap is the value of search, with overlap  typi-
          cally being (at least) a little smaller then search.

          See  also  speed  for  an effect that changes tempo and
          pitch together, pitch for an effect that changes  tempo
          and  pitch  together,  and  stretch  for an effect that
          changes tempo using a different algorithm.

     treble gain [frequency[k] [width[s|h|k|o|q]]]
          Apply a treble tone-control effect.  See  the  descrip-
          tion of the bass effect for details.

     tremolo speed [depth]
          Apply  a  tremolo  (low frequency amplitude modulation)
          effect to the audio.  The tremolo frequency  in  Hz  is
          given  by speed, and the depth as a percentage by depth
          (default 40).

     trim start [length|=end]
          Trim can trim off unwanted audio from the beginning and
          end  of  the  audio.   Audio  is not sent to the output

sox               Last change: February 19, 2011               68

Sound eXchange                                             SoX(1)

          stream until the start location is reached.

          The optional length parameter gives the length of audio
          to  output  after  the start sample and is thus used to
          trim off the end of the audio.  Alternatively, an abso-
          lute  end location can be given by preceding it with an
          equals sign.  Using a value of 0 for the start  parame-
          ter will allow trimming off the end only.

          Both parameters can be specified using either an amount
          of time or an exact count of samples.  The  format  for
          specifying  lengths  in time is hh:mm:ss.frac.  A start
          value of 1:30.5 will not start until 1  minute,  thirty
          and  1/2 seconds into the audio.  The format for speci-
          fying sample counts is the number of samples  with  the
          letter  `s'  appended  to it.  A value of 8000s for the
          start parameter will wait until 8000 samples  are  read
          before starting to process audio.

     vad [options]
          Voice  Activity Detector.  Attempts to trim silence and
          quiet background sounds from the ends of  (fairly  high
          resolution i.e. 16-bit, 44-48kHz) recordings of speech.
          The algorithm currently uses a  simple  cepstral  power
          measurement  to detect voice, so may be fooled by other
          things, especially music.  The  effect  can  trim  only
          from  the  front of the audio, so in order to trim from
          the back, the reverse effect must also be used.  E.g.

             play speech.wav norm vad

          to trim from the front,

             play speech.wav norm reverse vad reverse

          to trim from the back, and

             play speech.wav norm vad reverse vad reverse

          to trim from both ends.  The use of the norm effect  is
          recommended, but remember that neither reverse nor norm
          is suitable for use with streamed audio.

          Default values are shown in parenthesis.

          -t num (7)
               The measurement level  used  to  trigger  activity
               detection.   This might need to be changed depend-
               ing on the noise level,  signal  level  and  other
               charactistics of the input audio.

sox               Last change: February 19, 2011               69

Sound eXchange                                             SoX(1)

          -T num (0.25)
               The time constant (in seconds) used to help ignore
               short bursts of sound.

          -s num (1)
               The amount of audio (in  seconds)  to  search  for
               quieter/shorter  bursts  of audio to include prior
               to the detected trigger point.

          -g num (0.25)
               Allowed gap (in seconds)  between  quieter/shorter
               bursts  of  audio to include prior to the detected
               trigger point.

          -p num (0)
               The amount  of  audio  (in  seconds)  to  preserve
               before  the  trigger  point  and  any  found  qui-
               eter/shorter bursts.

          Advanced Options:
          These allow fine  tuning  of  the  alogithm's  internal

          -b num
               The  algorithm  (internally)  uses  adaptive noise
               estimation/reduction in order to detect the  start
               of  the  wanted  audio.  This option sets the time
               for the initial noise estimate.

          -N num
               Time constant used by the adaptive noise estimator
               for when the noise level is increasing.

          -n num
               Time constant used by the adaptive noise estimator
               for when the noise level is decreasing.

          -r num
               Amount of noise reduction to use in the  detection
               algorithm (e.g. 0, 0.5, ...).

          -f num
               Frequency  of  the algorithm's processing/measure-

          -m num
               Measurement duration; by default, twice  the  mea-
               surement period; i.e.  with overlap.

          -M num
               Time  constant  used  to  smooth spectral measure-

sox               Last change: February 19, 2011               70

Sound eXchange                                             SoX(1)

          -h num
               `Brick-wall' frequency of high-pass filter applied
               at the input to the detector algorithm.

          -l num
               `Brick-wall'  frequency of low-pass filter applied
               at the input to the detector algorithm.

          -H num
               `Brick-wall' quefrency of high-pass lifter used in
               the detector algorithm.

          -L num
               `Brick-wall'  quefrency of low-pass lifter used in
               the detector algorithm.

          See also the silence effect.

     vol gain [type [limitergain]]
          Apply an amplification or an attenuation to  the  audio
          signal.   Unlike  the -v option (which is used for bal-
          ancing multiple input  files  as  they  enter  the  SoX
          effects  processing  chain),  vol is an effect like any
          other so can be applied anywhere, and several times  if
          necessary, during the processing chain.

          The  amount to change the volume is given by gain which
          is interpreted, according to the given  type,  as  fol-
          lows:  if  type is amplitude (or is omitted), then gain
          is an amplitude (i.e.  voltage  or  linear)  ratio,  if
          power,  then  a power (i.e. wattage or voltage-squared)
          ratio, and if dB, then a power change in dB.

          When type is amplitude or power, a gain of 1 leaves the
          volume unchanged, less than 1 decreases it, and greater
          than 1 increases it; a negative gain inverts the  audio
          signal in addition to adjusting its volume.

          When  type  is  dB,  a  gain  of  0  leaves  the volume
          unchanged, less than 0 decreases it, and greater than 0
          increases it.

          See  [4]  for  a detailed discussion on electrical (and
          hence audio signal) voltage and power ratios.

          Beware of Clipping when the increasing the volume.

          The gain and the type parameters can be concatenated if
          desired, e.g.  vol 10dB.

          An  optional  limitergain  value  can  be specified and
          should be a value much less than 1 (e.g. 0.05 or  0.02)

sox               Last change: February 19, 2011               71

Sound eXchange                                             SoX(1)

          and  is  used  only  on peaks to prevent clipping.  Not
          specifying this parameter will cause no limiter  to  be
          used.   In  verbose  mode, this effect will display the
          percentage of the audio that needed to be limited.

          See also gain for a volume-changing effect with differ-
          ent  capabilities, and compand for a dynamic-range com-
          pression/expansion/limiting effect.

  Deprecated Effects
     The following effects have been renamed or have their  func-
     tionality  included in another effect; they continue to work
     in this version of SoX but may be removed in future.

     filter [low]-[high] [window-len [beta]]
          Apply a sinc-windowed lowpass,  highpass,  or  bandpass
          filter  of  given  window  length  to the signal.  This
          effect has been superseded by the  sinc  effect.   Com-
          pared  with  `sinc',  `filter'  is slower and has fewer

          low refers to the frequency of the lower 6dB corner  of
          the  filter.  high refers to the frequency of the upper
          6dB corner of the filter.

          A low-pass filter is obtained by leaving  low  unspeci-
          fied,  or 0.  A high-pass filter is obtained by leaving
          high unspecified, or 0, or greater than or equal to the
          Nyquist frequency.

          The   window-len,  if  unspecified,  defaults  to  128.
          Longer windows give a sharper cut-off, smaller  windows
          a more gradual cut-off.

          The beta parameter determines the type of filter window
          used.  Any value greater than  2  is  the  beta  for  a
          Kaiser  window.   Beta  <= 2 selects a Blackman-Nuttall
          window.  If unspecified, the default is a Kaiser window
          with beta 16.

          In  the  case  of Kaiser window (beta > 2), lower betas
          produce a somewhat faster transition from pass-band  to
          stop-band,  at the cost of noticeable artifacts. A beta
          of 16 is the default, beta less than 10 is  not  recom-
          mended.  If  you  want a sharper cut-off, don't use low
          beta's, use a longer sample window. A  Blackman-Nuttall
          window  is  selected by specifying any `beta' <= 2, and
          the Blackman-Nuttall window has somewhat  steeper  cut-
          off  than  the default Kaiser window. You will probably
          not need to use the beta parameter at all,  unless  you
          are  just curious about comparing the effects of Black-
          man-Nuttall vs. Kaiser windows.

sox               Last change: February 19, 2011               72

Sound eXchange                                             SoX(1)

          This effect supports the --plot global option.

     key [-q] shift [segment [search [overlap]]]
          Change the audio key (i.e. pitch but not tempo).   This
          is just an alias for the pitch effect.

     pan direction
          Mix  the  audio from one channel to another.  Use mixer
          or remix instead of this effect.

          The direction is a value from -1 to 1.   -1  represents
          far left and 1 represents far right.

     polyphase [-w nut|ham] [-width n] [-cut-off c]
     rabbit [-c0|-c1|-c2|-c3|-c4]
     resample [-qs|-q|-ql] [rolloff [beta]]
          Formerly  sample-rate-changing  effects  in  their  own
          right, these are now just aliases for the rate  effect.

     Exit  status is 0 for no error, 1 if there is a problem with
     the command-line parameters, or 2 if an error occurs  during
     file processing.

     Please  report  any bugs found in this version of SoX to the
     mailing list (

     See  attributes(5)  for  descriptions   of   the   following

     |Availability   | audio/sox        |
     |Stability      | Uncommitted      |
     soxi(1), soxformat(4), libsox(3)
     audacity(1), gnuplot(1), octave(1), wget(1)
     The SoX web site at
     SoX     scripting     examples     at     http://sox.source-

     [1]  R. Bristow-Johnson,  Cookbook  formulae  for  audio  EQ
          biquad               filter               coefficients,

sox               Last change: February 19, 2011               73

Sound eXchange                                             SoX(1)

     [2]  Wikipedia,                                    Q-factor,

     [3]  Scott  Lehman,  Effects  Explained, http://harmony-cen-

     [4]  Wikipedia, Decibel,

     [5]  Richard  Furse,  Linux  Audio Developer's Simple Plugin

     [6]  Richard     Furse,     Computer     Music      Toolkit,

     [7]  Steve Harris, LADSPA plugins,

     Copyright 1998-2011 Chris Bagwell and SoX Contributors.
     Copyright 1991 Lance Norskog and Sundry Contributors.

     This  program  is  free  software;  you  can redistribute it
     and/or modify it under the terms of the GNU  General  Public
     License as published by the Free Software Foundation; either
     version 2, or (at your option) any later version.

     This program is distributed in the hope that it will be use-
     ful, but WITHOUT ANY WARRANTY; without even the implied war-
     POSE.   See the GNU General Public License for more details.

     Chris   Bagwell   (    Other
     authors  and  contributors  are listed in the ChangeLog file
     that is distributed with the source code.

     This  software  was   built   from   source   available   at    The  original
     community   source   was   downloaded   from    http://down-

     Further information about this software can be found on  the
     open   source   community   website   at  http://sox.source-

sox               Last change: February 19, 2011               74