Sample¶
- class maelzel.snd.audiosample.Sample(sound, sr=0, start=0.0, end=0.0, readonly=False, engine=None)[source]¶
Bases:
object
A class representing audio data
- Parameters:
sound (str | Path | np.ndarray) – str, a Path or a numpy array either sample data or a path to a soundfile
sr (int) – only needed if passed an array
start – the start time (only valid when reading from a soundfile). Can be negative, in which case the frame is sought from the end.
end – the end time (only valid when reading from a soundfile). Can be negative, in which case the frame is sought from the end
readonly – is this Sample readonly?
engine (csoundengine.Engine | None) – the sound engine (
csoundengine.Engine
) used for playback
Attributes Summary
Methods Summary
addChannels
(channels)Create a new Sample with added channels
appendSilence
(dur)Return a new Sample with added silence at the end
asbpf
()Convert this sample to a bpf4.core.Sampled bpf :rtype:
BpfInterface
chunks
(chunksize[, hop, pad])Iterate over the samples in chunks of chunksize.
concat
(*other)Join (concatenate) this Sample with other(s)
Return a Sample ensuring that the samples are contiguous in memory
copy
()Return a copy of this Sample :rtype:
Sample
createSilent
(dur, channels, sr)Generate a silent Sample with the given characteristics
fade
(fadetime[, shape])Fade this Sample inplace, returns self.
firstPitch
([threshold, minfreq, overlap, ...])Returns the first (monophonic) pitch found
firstSilence
([threshold, period, overlap, ...])Find the first silence in this sample
firstSound
([threshold, period, overlap, start])Find the time of the first sound within this sample
fundamental
([fftsize, overlap, unvoiced, ...])Track the fundamental frequency of this sample
fundamentalAnalysis
([semitoneQuantization, ...])Analyze the fundamental of this sound, assuming it is a monophonic sound
fundamentalBpf
([fftsize, overlap, method, ...])Construct a bpf which follows the fundamental of this sample in time
fundamentalFreq
([time, dur, fftsize, ...])Calculate the fundamental freq.
getChannel
(n[, contiguous])return a new mono Sample with the given channel
getEngine
(**kws)Returns the csound Engine used for playback, starts the Engine if needed
join
(samples)Concatenate a sequence of Samples
mix
(samples[, offsets, gains])Static method: mix the given samples down, optionally with a time offset
mixdown
([enforceCopy])Return a new Sample with this sample downmixed to mono
normalize
([headroom])Normalize inplace, returns self
onsets
([fftsize, overlap, method, ...])Detect onsets
openInEditor
([wait, app, fmt])Open the sample in an external editor.
partialTrackingAnalysis
([resolution, ...])Analyze this audiosample using partial tracking
peak
()Highest sample value in dB
peaksbpf
([framedur, overlap])Create a bpf representing the peaks envelope of the source
play
([loop, chan, gain, delay, pan, speed, ...])Play the given sample
plot
([profile])plot the sample data
plotMelSpectrogram
([fftsize, overlap, ...])Plot a mel-scale spectrogram
plotSpectrogram
([fftsize, window, winsize, ...])Plot the spectrogram of this sound using matplotlib
plotSpetrograph
([framesize, window, start, dur])Plot the spectrograph of this sample or a fragment thereof
preparePlay
([engine])Send audio data to the audio engine (blocking)
prependSilence
(dur)Return a new Sample with silence of given dur at the beginning
reprHtml
([withHeader, withAudiotag, ...])Returns an HTML representation of this Sample
resample
(sr)Return a new Sample with the given sr
reverse
()reverse the sample in-place, returns self
rms
()RMS of the samples
rmsbpf
([dt, overlap])Creates a BPF representing the rms of this sample over time :rtype:
Sampled
scrub
(bpf)Scrub the samples with the given curve
show
([withAudiotag, figsize, external, profile])spectrumAt
(time[, resolution, channel, ...])Analyze sinusoidal components of this Sample at the given time
strip
([threshold, margin, window])Remove silence from the sides.
stripLeft
([threshold, margin, window])Remove silence from the left.
stripRight
([threshold, margin, window])Remove silence from the right.
write
(outfile[, encoding, overflow, fmt, ...])Write the samples to outfile
Attributes Documentation
- duration¶
The duration in seconds
- numframes¶
The number of frames
Methods Documentation
- addChannels(channels)[source]¶
Create a new Sample with added channels
- Parameters:
channels (
ndarray
|int
) – the audiodata of the new channels or the number of empty channels to add (as integer). In the case of passing audio data this new samples should have the exact same number of frames as self- Return type:
- Returns:
a new Sample with the added channels. The returned Sample will have the same duration as self
- appendSilence(dur)[source]¶
Return a new Sample with added silence at the end
- Parameters:
dur (
float
) – the duration of the added silence- Return type:
- Returns:
a new Sample
See also
Sample.prependSilence()
,Sample.join()
,Sample.append()
- chunks(chunksize, hop=None, pad=False)[source]¶
Iterate over the samples in chunks of chunksize.
If pad is True, the last chunk will be zeropadded, if necessary
- Parameters:
chunksize (int) – the size of each chunk
hop (int) – the number of samples to skip
pad – if True, pad the last chunk with 0 to fill chunksize
- Return type:
Iterator[np.ndarray]
- Returns:
an iterator over the chunks
- concat(*other)[source]¶
Join (concatenate) this Sample with other(s)
- Parameters:
*other (
Sample
) – one or more Samples to join together- Return type:
- Returns:
the resulting Sample
See also
- contiguous()[source]¶
Return a Sample ensuring that the samples are contiguous in memory
If self is already contiguous, self is returned
- Return type:
- copy()[source]¶
Return a copy of this Sample :rtype:
Sample
Note
if self is readonly, the copied Sample will not be readonly.
- classmethod createSilent(dur, channels, sr)[source]¶
Generate a silent Sample with the given characteristics
- Parameters:
dur (
float
) – the duration of the new Samplechannels (
int
) – the number of channelssr (
int
) – the sample rate
- Return type:
- Returns:
a new Sample with all samples set to 0
- fade(fadetime, shape='linear')[source]¶
Fade this Sample inplace, returns self.
If only value is given as fadetime a fade-in and fade-out is performed with this fadetime. A tuple can be used to apply a different fadetime for in and out.
- Parameters:
fadetime (
float
|tuple
[float
,float
]) – the duration of the fade.shape (
str
) – the shape of the fade. One of ‘linear’, ‘expon(x)’, ‘halfcos’
- Return type:
- Returns:
self
Note
To generate a faded sample without modifying the original sample, use
sample = sample.copy().fade(...)
Example:
>>> sample1= Sample("sound.wav") # Fade-in and out >>> sample1.fade(0.2) >>> sample2 = Sample("another.wav") # Create a copy with a fade-out of 200 ms >>> sample3 = sample2.copy().fade((0, 0.2))
- firstPitch(threshold=-120, minfreq=60, overlap=4, channel=0, chunkdur=0.25)[source]¶
Returns the first (monophonic) pitch found
- Parameters:
threshold – the silence threhsold
minfreq – the min. frequency to considere valid
overlap – pitch analysis overlap
channel – for multichannel audio, which channel to use
chunkdur – chunk duration to analyze, in seconds
- Return type:
tuple
[float
,float
]- Returns:
a tuple (time, freq) of the first pitched sound found. If no pitched sound found, returns (0, 0)
- firstSilence(threshold=-80, period=0.04, overlap=2, soundthreshold=-50, start=0.0)[source]¶
Find the first silence in this sample
- Parameters:
threshold – rms value which counts as silence, in dB
period – the time period to calculate the rms
overlap – determines the step size between rms calculations
soundthreshold – rms value which counts as sound, in dB
start – start time (0=start of sample)
- Return type:
float
|None
- Returns:
the time of the first silence, or None if no silence found
- firstSound(threshold=-120.0, period=0.04, overlap=2, start=0.0)[source]¶
Find the time of the first sound within this sample
This does not make any difference between background noise or pitched/voiced sound
- Parameters:
threshold – the sound threshold in dB.
period – the time period to calculate the rms
overlap – determines the step size between rms calculations
start – start time (0=start of sample)
- Return type:
float
|None
- Returns:
the time of the first sound, or None if no sound found
See also
- fundamental(fftsize=2048, overlap=4, unvoiced='negative', minAmpDb=-60, sensitivity=0.7)[source]¶
Track the fundamental frequency of this sample
- Parameters:
fftsize – the fft size to use
overlap – number of overlaps
unvoiced – one of ‘negative’ or ‘nan’
- Return type:
tuple
[ndarray
,ndarray
]- Returns:
a tuple (times, freqs), both numpy arrays. The frequency array will contain a negative frequency whenever the sound is unvoiced (inharmonic, no fundamental can be predicted)
See also
maelzel.snd.vamptools.pyinSmoothPitch()
,maelzel.snd.freqestimate.f0curvePyinVamp()
- fundamentalAnalysis(semitoneQuantization=0, fftsize=None, simplify=0.08, overlap=8, minFrequency=50, minSilence=0.08, onsetThreshold=0.05, onsetOverlap=8)[source]¶
Analyze the fundamental of this sound, assuming it is a monophonic sound
This is a wrapper around
maelzel.transcribe.mono.FundamentalAnalysisMono
and is placed here for visibility and easy of use. To access all parameters, use that dirctly- Return type:
mono.FundamentalAnalysisMonophonic
- Returns:
a
maelzel.transcribe.mono.FundamentalAnalysisMono
Example
>>> from maelzel.snd import audiosample >>> samp = audiosample.Sample("sndfile.wav") >>> f0analysis = samp.fundamentalAnalysis() >>> notes = [(group.start(), group.duration(), group.meanfreq()) ... for group in f0analysis.groups]
- fundamentalBpf(fftsize=2048, overlap=4, method='pyin', unvoiced='negative')[source]¶
Construct a bpf which follows the fundamental of this sample in time
Note
The method ‘pyin-vamp’ depends on the python module ‘vamphost’ and the pyin vamp plugin being installed
vamp host: original code: https://code.soundsoftware.ac.uk/projects/vampy-host (install via
pip install vamphost
)pyin plugin can be downloaded from https://code.soundsoftware.ac.uk/projects/pyin/files. More information about installing VAMP plugins: https://www.vamp-plugins.org/download.html#install
- Parameters:
fftsize – the size of the fft, in samples
overlap – determines the hop size
method – one of ‘pyin’ or ‘fft’.
unvoiced – method to handle unvoiced sections. One of ‘nan’, ‘negative’, ‘keep’
- Return type:
BpfInterface
- Returns:
a bpf representing the fundamental freq. of this sample
- fundamentalFreq(time=None, dur=0.2, fftsize=2048, overlap=4, fallbackfreq=0)[source]¶
Calculate the fundamental freq. at a given time
The returned frequency is averaged over the given duration period At the moment the smooth pyin method is used
- Parameters:
time (
Optional
[float
]) – the time to start sampling the fundamental frequency. If None is given, the first actual sound within this Sample is useddur – the duration of the estimation period. The returned frequency will be the average frequency over this period of time.
fftsize – the fftsize used
fallbackfreq – frequency to use when no fundamental frequency was detected
overlap – amount of overlaps per fftsize, determines the hop time
- Return type:
float
|None
- Returns:
the average frequency within the given period of time, or None if no fundamental was found
- getChannel(n, contiguous=False)[source]¶
return a new mono Sample with the given channel
- Parameters:
n (
int
) – the channel index (starting with 0)contiguous – if True, ensure that the samples are represented as contiguous in memory
- Return type:
- static getEngine(**kws)[source]¶
Returns the csound Engine used for playback, starts the Engine if needed
If no playback has been performed up to this point, a new Engine is created. Keywords are passed directly to
csoundengine.Engine
(https://csoundengine.readthedocs.io/en/latest/api/csoundengine.engine.Engine.html#csoundengine.engine.Engine) and will only take effect if this function is called before any playback has been performed.An already existing Engine can be set as the playback engine via
Sample.setEngine()
:rtype: csoundengine.EngineSee also
Sample.setEngine()
- static join(samples)[source]¶
Concatenate a sequence of Samples
Samples should share numchannels. If mismatching samplerates are found, all samples are upsampled to the highest sr
- Parameters:
samples (Sequence[Sample]) – a seq. of Samples
- Return type:
Sample
- Returns:
the concatenated samples as one Sample
- static mix(samples, offsets=None, gains=None)[source]¶
Static method: mix the given samples down, optionally with a time offset
This is a static method. All samples should share the same number of channels and sr
- Parameters:
samples (
list
[Sample
]) – the Samples to mixoffsets (
Optional
[list
[float
]]) – if given, an offset in seconds for each samplegains (
Optional
[list
[float
]]) – if given, a gain for each sample
- Return type:
- Returns:
the resulting Sample
Example:
>>> from maelzel.snd.audiosample import Sample >>> a = Sample("stereo-2seconds.wav") >>> b = Sample("stereo-3seconds.wav") >>> m = Sample.mix([a, b], offsets=[2, 0]) >>> m.duration 4.0
- mixdown(enforceCopy=False)[source]¶
Return a new Sample with this sample downmixed to mono
- Parameters:
enforceCopy – always return a copy, even if self is already mono
- Return type:
- Returns:
a mono version of self.
- normalize(headroom=0.0)[source]¶
Normalize inplace, returns self
- Parameters:
headroom – maximum peak in dB
- Return type:
- Returns:
self
- onsets(fftsize=2048, overlap=4, method='rosita', threshold=None, mingap=0.03)[source]¶
Detect onsets
Depending on the implementation, onsets can be “possitive” onsets, similar to an attack, or just sudden changes in the spectrum; this includes “negative” onsets, which would be detected at the sudden end of a note. To accurately track onsets it might be useful to use other features, like peak amplitude, rms, or voicedness to check the kind of onset.
For an in-depth demonstration of these concepts see https://github.com/gesellkammer/maelzel/blob/master/notebooks/onsets.ipynb
- Parameters:
fftsize – the size of the window
overlap – a hop size as a fraction of the fftsize
method (
str
) – one of ‘rosita’ (using a lightweight version of librosa’s onset detection algorithm) or ‘aubio’ (needs aubio to be installed)threshold (
Optional
[float
]) – the onset sensitivity. This is a value specific for a given method (rosita has a default of 0.07, while aubio has a default of 0.03)mingap – the min. time between two onsets
- Return type:
ndarray
- Returns:
a list of onset times, as a numpy array
Example
from maelzel.snd import audiosample from maelzel.core import * from pitchtools import * samp = audiosample.Sample("snd/finneganswake-fragm01.flac").getChannel(0, contiguous=True)[0:10] onsets = samp.onsets(threshold=0.1, mingap=0.05) ax = samp.plotSpectrogram() # Plot each onset as a vertical line ax.vlines(onsets, ymin=0, ymax=10000, color='white', alpha=0.4, linewidth=2)
See also
maelzel.snd.features.onsetsAubio
maelzel.snd.features.onsets
- openInEditor(wait=True, app=None, fmt='wav')[source]¶
Open the sample in an external editor.
The original is not changed.
- Parameters:
wait – if True, the editor is opened in blocking mode, the results of the edit are returned as a new Sample
app – if given, this application is used to open the sample. Otherwise, the application configured via the key ‘editor’ is used
fmt – the format to write the samples to
- Return type:
Sample
|None
- Returns:
if wait is True, returns the sample after closing editor
- partialTrackingAnalysis(resolution=50.0, channel=0, windowsize=None, freqdrift=None, hoptime=None, mindb=-90)[source]¶
Analyze this audiosample using partial tracking
- Parameters:
resolution (float) – the resolution of the analysis, in Hz
channel – which channel to analyze
windowsize (float | None) – The window size in hz. This value needs to be higher than the resolution since the window in samples needs to be smaller than the fft analysis
mindb – the amplitude floor.
hoptime (float) – the time to move the window after each analysis. For overlap==1, this is 1/windowsize. For overlap==2, 1/(windowsize*2)
freqdrift (float) – the max. variation in frequency between two breakpoints (by default, 1/2 resolution)
- Return type:
_spectrum.Spectrum
- Returns:
a
maelzel.partialtracking.spectrum.Spectrum
See also
spectrumAt()
,maelzel.partialtracking.spectrum.Spectrum.analyze()
- peaksbpf(framedur=0.01, overlap=2)[source]¶
Create a bpf representing the peaks envelope of the source
- Parameters:
framedur – the duration of an analysis frame (in seconds)
overlap – determines the hop time between analysis frames.
hoptime = framedur / overlap
- Return type:
Sampled
A peak is the absolute maximum value of a sample over a window of time (the framedur in this case). To use another metric for tracking amplitude see
Sample.rmsbpf()
which uses rms.The resolution of the returned bpf will be
framedur/overlap
- play(loop=False, chan=1, gain=1.0, delay=0.0, pan=None, speed=1.0, skip=0.0, dur=0, engine=None, block=False, backend='')[source]¶
Play the given sample
At the moment two playback backends are available, portaudio and csound.
If no engine is given and playback is immediate (no delay), playback is performed directly via portaudio. This has the advantage that no data must be copied to the playback engine (which is the case when using csound)
If backend is ‘csound’ or a csoundengine’s Engine is passed, csound is used as playback backend. The csound backend is recommended if sync is needed between this playback and other events.
- Parameters:
loop – should playback be looped?
chan (int) – first channel to play to. For stereo samples, output is routed to consecutive channels starting with this channel
gain – a gain modifier
delay – start delay in seconds
pan (float) – a value between 0 (left) and 1 (right). Use -1 for a default value, which is 0 for mono samples and 0.5 for stereo. For 3 or more channels pan is currently ignored
speed – the playback speed. A variation in speed will change the pitch accordingly.
skip – start playback at a given point in time
dur – duration of playback. 0 indicates to play until the end of the sample
engine (csoundengine.Engine) – the Engine instance to use for playback. If not given, playback is performed via portaudio
block – if True, block execution until playback is finished
backend – one of ‘portaudio’, ‘csound’
- Return type:
PlaybackStream
- Returns:
a
PlaybackStream
. This can be used to stop playback
See also
Sample.setEngine()
- plot(profile='auto')[source]¶
plot the sample data
- Parameters:
profile – one of ‘low’, ‘medium’, ‘high’
- Return type:
list[plt.Axes]
- Returns:
a list of pyplot Axes
- plotMelSpectrogram(fftsize=2048, overlap=4, winsize=None, nmels=128, axes=None, axislabels=False, cmap='magma')[source]¶
Plot a mel-scale spectrogram
- Parameters:
fftsize – the fftsize in samples
overlap – the amount of overlap. An overlap of 4 will result in a hop-size of winlength samples // overlap
winsize (int) – the window size in samples. If None, fftsize is used. If given, winlength <= fftsize
nmels – number of mel bins
axes – if given, plot on these Axes
axislabels – if True, include labels on the axes
cmap (str) – a color map byname
- Return type:
plt.Axes
- Returns:
the Axes used
- plotSpectrogram(fftsize=2048, window='hamming', winsize=None, overlap=4, mindb=-120, minfreq=40, maxfreq=12000, yaxis='linear', figsize=(24, 10), axes=None)[source]¶
Plot the spectrogram of this sound using matplotlib
- Parameters:
fftsize – the size of the fft.
window – window type. One of ‘hamming’, ‘hanning’, ‘blackman’, … (see scipy.signal.get_window)
winsize (int) – window size in samples, defaults to fftsize
mindb – the min. amplitude to plot
overlap – determines the hop size (hop size in samples = fftsize/overlap). None to infer a sensible default from the other parameters
minfreq (int) – the min. freq to plot
maxfreq (int) – the highes freq. to plot. If None, a default is estimated (check maelzel.snd.plotting.config)
yaxis – one of ‘linear’ or ‘log’
figsize – the figure size, a tuple (width: int, height: int)
axes – a matplotlib Axes object. If passed, plotting is done using this Axes; otherwise a new Axes object is created and returned
- Return type:
plt.Axes
- Returns:
the matplotlib Axes
- plotSpetrograph(framesize=2048, window='hamming', start=0.0, dur=0.0)[source]¶
Plot the spectrograph of this sample or a fragment thereof
- Parameters:
framesize – the size of each analysis, in samples
window – As passed to scipy.signal.get_window
blackman
,hamming
,hann
,bartlett
,flattop
,parzen
,bohman
,blackmanharris
,nuttall
,barthann
,kaiser
(needs beta),gaussian
(needs standard deviation)start – if given, plot the spectrograph at this time
dur – if given, use this fragment of the sample (0=from start to end of sample)
- Return type:
None
Plots the spectrograph of the entire sample (slice before to use only a fraction)
- prependSilence(dur)[source]¶
Return a new Sample with silence of given dur at the beginning
- Parameters:
dur (
float
) – duration of the silence to add at the beginning- Return type:
- Returns:
new Sample
- reprHtml(withHeader=True, withAudiotag=None, figsize=(24, 4), profile='')[source]¶
Returns an HTML representation of this Sample
This can be used within a Jupyter notebook to force the html display. It is useful inside a block were it would not be possible to put this Sample as the last element of the cell to force the html representation
- Parameters:
withHeader – include a header line with repr text (‘Sample(…)’)
withAudiotag (
Optional
[bool
]) – include html for audio playback. If None, this defaults to config[‘reprhtml_include_audiotag’]
- Return type:
str
- Returns:
the HTML repr as str
Example
>>> from maelzel.snd.audiosample import Sample >>> sample = Sample("snd/Numbers_EnglishFemale.flac") >>> sample.reprHtml()
- rms()[source]¶
RMS of the samples
This method returns the rms for all the frames at once. As such it is only of use for short samples. The use case is as follows: :rtype:
float
>>> from maelzel.snd.audiosample import Sample >>> from pitchtools import amp2db >>> s = Sample("/path/to/sample.flac") >>> amp2db(s[0.5:0.7].rms()) -12.05
See also
- rmsbpf(dt=0.01, overlap=1)[source]¶
Creates a BPF representing the rms of this sample over time :rtype:
Sampled
- scrub(bpf)[source]¶
Scrub the samples with the given curve
- Parameters:
bpf (
BpfInterface
) – a bpf mapping time -> time (seebpf
)- Return type:
Example:
Read sample at half speed >>> import bpf4 >>> sample = Sample("path.wav") >>> dur = sample.duration >>> sample2 = sample.scrub(bpf4.linear([(0, 0), (dur*2, dur)]))
- spectrumAt(time, resolution=50.0, channel=0, windowsize=-1, mindb=-90, minfreq=None, maxfreq=12000, maxcount=0)[source]¶
Analyze sinusoidal components of this Sample at the given time
- Parameters:
time (
float
) – the time to analyzeresolution (
float
) – the resolution of the analysis, in hzchannel – if this sample has multiple channels, which channel to analyze
windowsize (
float
) – the window size in hzmindb – the min. amplitude in dB for a component to be included
minfreq (
Optional
[int
]) – the min. frequency of a component to be includedmaxfreq – the max. frequency of a component to be included
maxcount – the max. number of components to include (0 to include all)
- Return type:
list
[tuple
[float
,float
]]- Returns:
a list of pairs (frequency, amplitude) where each pair represents a sinusoidal component of this sample at the given time. Amplitudes are in the range 0-1
- strip(threshold=-120.0, margin=0.01, window=0.02)[source]¶
Remove silence from the sides. Returns a new Sample
- Parameters:
threshold – dynamic of silence, in dB
margin – leave at list this amount of time between the first/last sample and the beginning of silence or
window – the duration of the analysis window, in seconds
- Return type:
- Returns:
a new Sample with silence at the sides removed
- stripLeft(threshold=-120.0, margin=0.01, window=0.02)[source]¶
Remove silence from the left. Returns a new Sample
- Parameters:
threshold – dynamic of silence, in dB
margin – leave at list this amount of time between the first sample and the beginning of silence
window – the duration of the analysis window, in seconds
- Return type:
- Returns:
a new Sample with silence removed
- stripRight(threshold=-120.0, margin=0.01, window=0.02)[source]¶
Remove silence from the right. Returns a new Sample
- Parameters:
threshold – dynamic of silence, in dB
margin – leave at list this amount of time between the first/last sample and the beginning of silence or
window – the duration of the analysis window, in seconds
- Return type:
- Returns:
a new Sample with silence removed
- write(outfile, encoding='', overflow='fail', fmt='', bitrate=224, **metadata)[source]¶
Write the samples to outfile
- Parameters:
outfile (
str
) – the name of the soundfile. The extension determines the file formatencoding – the encoding to use. One of pcm16, pcm24, pcm32, float32, float64 or, in the case of mp3 or ogg, the frame rate as integer (160 = 160Kb)
fmt – if not given, it is inferred from the extension. One of ‘wav’, ‘aiff’, ‘flac’.
overflow – one of ‘fail’, ‘normalize’, ‘nothing’. This applies only to pcm formats (wav, aif, mp3)
bitrate – bitrate used when writing to mp3
metadata – XXX
- Return type:
None