2. Transcription¶
Transcription of audio into a musical representation is not a completely objective process. The kind of transcription and the methods used depend heavily on the purpose of the transcription, on what it will be used for. Within maelzel there is a transcribe
package which implements multiple transcription strategies
2.1. Voice Analysis / Transcription¶
When transcribing a human voice, which is a monophonic source with highly harmonic timbre for the pitched parts of speech/song, probably the most appropriate transcription method is based on the analysis of the fundamental frequency in combination with onset/offset prediction and other secondary features.
maelzel.transcribe.FundamentalAnalysisMonophonic
implements the skeleton of such an approach:
Onset detection
The fundamental is sampled within each onset-offset timespan to include any pitch inflections.
A list of “gestures” is generated, where each gestures consists on a series of breakpoints. A breakpoint has information regarding pitch, amplitude, voicedness (how much pitch content the voice has) and other features at a given time
Unpitched sections are analyzed using secondary features, like centroid, to characterize them with more detail.
TODO: it might be desirable to extend the analysis of the unpitched sections using more detailed features like mfb
or mfcc
. For voice transcription it would also be relevant to perform formant analysis and vowel/phoneme prediction to enrich the transcription
[1]:
from maelzel.snd.audiosample import Sample
from maelzel import transcribe
from maelzel.core import *
import matplotlib.pyplot as plt
2.2. Configuration¶
When performing automatic transcription the quantization results tend to become very complicated. For this reason it is important to limit the complexity allowed by the quantization algorithm in order to keep the results readable. As it will be seen later the resulting transcription remains very close to the original, even with low complexity quantization.
[2]:
cfg = getConfig()
cfg['quant.complexity'] = 'low'
cfg['show.centsDeviationAsTextAnnotation'] = False # Disable text annotations for cents deviation for each note
cfg['show.horizontalSpacing'] = 'large' # More reabable since every note probably has its own accidental
cfg['show.lilypondGlissandoMinimumLength'] = 3 # Makes sure that glissando lines are always shown
[3]:
# s = Sample('../snd/Numbers_EnglishFemale.flac')
# s = Sample('../snd/colours-german-male.flac')
# s = Sample("~/tmp/gliss.flac")
# s = Sample('../snd/finneganswake-fragm01.flac')
# s = Sample('../snd/voiceover-fragment-48k.flac')
s = Sample('snd/istambul2.flac')
# Only the left channel
s0 = s.getChannel(0, contiguous=True)
s0
[3]:
20
, sr=44100
, numchannels=1
)Now we perform the analysis itself. There are many parameters which can be customized but in general the defaults tend to produce viable results. The analysis is then plotted to show utterances split by onsets. Within each onset-offset range the predicted (and already simplified) fundamental is shown as a line. Zero-frequency sections represent onsets for where the confidence of the fundamental pitch prediction was too low.
[52]:
analysis = transcribe.FundamentalAnalysisMonophonic(s0.samples,
sr=s0.sr,
# Quantize the pitch to its nearest 1/8th tone
semitoneQuantization=4,
minSilence=0.001,
# Mark which are the most salient onsets
accentPercentile=0.1,
# Simplify the pitch contour
simplify=0.08,
# Default: 0.07, lower for more onset sensitivity
onsetThreshold=0.05,
# Unvoiced utterances softer than this are not taken into
# consideration since they are often oart of the
# background noise
unvoicedMinAmplitudePercentile=0.3,
overlap=16,
onsetOverlap=8,
onsetBacktrack=True)
plt.figure(figsize=(20, 6))
analysis.plot(spanAlpha=0.2)
mnOut size: 6891
m_pitchTrack size: 6891
The analysis contains a series of breakpoints.These breakpoints are grouped by onset/offset region and stored in the .groups
attribute within the analysis. A group contains all the breakpoints between an onset and the next onset. Notice that a group can have an offset or not. A group without an offset indicates that there is no interruption between the end of one group and the start of the next. Keep in mind that onset prediction is performed based on spectral flow: an onset indicates a
significant change of spectrum within a given period of time.
[27]:
analysis
[27]:
time | freq | amp | voiced | linked | onsetStrength | freqConfidence | kind | isaccent | duration | properties |
---|---|---|---|---|---|---|---|---|---|---|
0.3947 | 154.0302 | 0.0012 | False | True | 0.3751 | 0.0000 | onset | False | 0.3135 | |
0.7082 | 151.8219 | 0.1181 | False | False | 8.0438 | 0.0000 | False | 0.0000 | ||
0.7082 | 172.8930 | 0.1181 | False | True | 8.0438 | 0.0000 | onset | True | 0.1625 | |
0.8707 | 139.2213 | 0.1047 | False | False | 2.4267 | 0.0000 | False | 0.0000 | ||
0.8707 | 167.9711 | 0.1047 | False | True | 2.4267 | 0.0000 | onset | True | 0.1258 | |
0.9966 | 175.4078 | 0.0946 | True | True | 0.4189 | 0.6834 | False | 0.0599 | ||
1.0565 | 163.1893 | 0.1097 | True | False | 0.8343 | 0.7343 | False | 0.0000 | ||
1.0565 | 163.1893 | 0.1097 | True | True | 0.8343 | 0.7343 | onset | False | 0.1858 | |
1.2423 | 165.5629 | 0.1000 | True | False | 1.1559 | 0.5815 | False | 0.0000 | ||
1.2423 | 165.5629 | 0.1000 | True | True | 1.1559 | 0.5815 | onset | True | 0.2612 | |
1.5035 | 167.9711 | 0.0013 | False | False | 1.7849 | 0.0000 | False | 0.0000 | ||
1.5035 | 0.0000 | 0.0013 | False | True | 1.7849 | 0.0000 | onset | True | 0.2322 | |
1.7357 | 0.0000 | 0.0239 | False | False | 0.3449 | 0.0000 | False | 0.0000 | ||
1.7357 | 147.4998 | 0.0239 | False | True | 0.3449 | 0.0000 | onset | False | 0.1078 | |
1.8434 | 129.5234 | 0.0856 | True | True | 0.3591 | 1.0000 | False | 0.0200 | ||
1.8634 | 158.5436 | 0.0683 | True | False | 4.5178 | 0.0176 | False | 0.0000 | ||
1.8634 | 158.5436 | 0.0683 | True | True | 4.5178 | 0.0176 | onset | True | 0.1858 | |
2.0492 | 156.2706 | 0.0302 | False | False | 3.3984 | 0.0000 | False | 0.0000 | ||
2.0492 | 129.5234 | 0.0302 | False | True | 3.3984 | 0.0000 | onset | True | 0.0734 | |
2.1226 | 129.5234 | 0.0288 | True | True | 0.6460 | 0.0413 | False | 0.0734 | ||
2.1960 | 101.3289 | 0.0152 | True | True | 0.3784 | 0.4973 | False | 0.2618 | ||
2.4578 | 99.8762 | 0.0009 | False | False | 0.8554 | 0.0000 | offset | False | 0.0093 | |
2.4671 | 0.0000 | 0.0009 | False | True | 0.5128 | 0.0000 | onset | False | 0.0619 | |
2.5290 | 0.0000 | 0.0012 | False | False | 0.2495 | 0.0000 | offset | False | 0.1645 | |
2.6935 | 0.0000 | 0.0014 | False | True | 1.2353 | 0.0000 | onset | True | 0.0871 | |
2.7806 | 0.0000 | 0.0013 | False | False | 1.8610 | 0.0000 | False | 0.0000 | ||
2.7806 | 0.0000 | 0.0013 | False | True | 1.8610 | 0.0000 | onset | True | 0.5367 | |
3.3173 | 0.0000 | 0.0011 | False | False | 0.2848 | 0.0000 | offset | False | 0.4675 | |
3.7849 | 165.5629 | 0.0921 | False | True | 3.2361 | 0.0000 | onset | True | 0.1741 | |
3.9590 | 131.4074 | 0.1086 | False | False | 3.1797 | 0.0000 | False | 0.0000 | ||
3.9590 | 165.5629 | 0.1086 | False | True | 3.1797 | 0.0000 | onset | True | 0.1683 | |
4.1273 | 172.8930 | 0.1093 | True | False | 0.4700 | 0.5129 | False | 0.0000 | ||
4.1273 | 172.8930 | 0.1093 | True | True | 0.4700 | 0.5129 | onset | False | 0.2090 | |
4.3363 | 165.5629 | 0.0966 | True | False | 1.3259 | 0.5701 | False | 0.0000 | ||
4.3363 | 165.5629 | 0.0966 | True | True | 1.3259 | 0.5701 | onset | True | 0.0578 | |
4.3941 | 177.9592 | 0.0943 | True | True | 0.7116 | 0.4555 | False | 0.2151 | ||
4.6092 | 172.8930 | 0.0084 | True | False | 0.5496 | 0.7848 | False | 0.0000 | ||
4.6092 | 172.8930 | 0.0084 | True | True | 0.5496 | 0.7848 | onset | False | 0.0397 | |
4.6489 | 147.4998 | 0.0027 | True | True | 0.4243 | 0.3823 | False | 0.1847 | ||
4.8336 | 147.4998 | 0.0476 | False | True | 1.1359 | 0.0202 | False | 0.0020 | ||
4.8356 | 158.5436 | 0.0479 | True | False | 1.1429 | 0.0639 | False | 0.0000 | ||
4.8356 | 158.5436 | 0.0479 | True | True | 1.1429 | 0.0639 | onset | True | 0.0773 | |
4.9129 | 149.6452 | 0.1306 | True | True | 1.2890 | 0.5596 | False | 0.0754 | ||
4.9883 | 172.8930 | 0.0893 | True | True | 0.9242 | 0.6797 | False | 0.1666 | ||
5.1548 | 137.2253 | 0.0983 | False | False | 2.9877 | 0.0111 | False | 0.0000 | ||
5.1548 | 160.8497 | 0.0983 | False | True | 2.9877 | 0.0111 | onset | True | 0.0813 | |
5.2361 | 163.1893 | 0.0440 | True | False | 2.0383 | 0.6287 | False | 0.0000 | ||
5.2361 | 163.1893 | 0.0440 | True | True | 2.0383 | 0.6287 | onset | True | 0.1509 | |
5.3870 | 156.2706 | 0.1171 | False | False | 0.8719 | 0.0000 | False | 0.0000 | ||
5.3870 | 194.0659 | 0.1171 | False | True | 0.8719 | 0.0000 | onset | False | 0.0773 | |
5.4643 | 191.2836 | 0.0657 | True | True | 0.5144 | 0.4880 | False | 0.1486 | ||
5.6129 | 156.2706 | 0.0161 | True | True | 0.5846 | 0.1090 | False | 0.1863 | ||
5.7992 | 156.2706 | 0.0653 | False | False | 2.8919 | 0.0000 | False | 0.0000 | ||
5.7992 | 163.1893 | 0.0653 | False | True | 2.8919 | 0.0000 | onset | True | 0.1270 | |
5.9262 | 145.3851 | 0.0830 | True | True | 0.7192 | 0.5901 | False | 0.0933 | ||
6.0194 | 167.9711 | 0.0631 | True | True | 0.6940 | 0.5892 | False | 0.1687 | ||
6.1881 | 158.5436 | 0.0177 | True | False | 0.5140 | 0.6107 | False | 0.0000 | ||
6.1881 | 158.5436 | 0.0177 | True | True | 0.5140 | 0.6107 | onset | False | 0.1045 | |
6.2926 | 131.4074 | 0.0800 | True | False | 1.9536 | 0.5357 | False | 0.0000 | ||
6.2926 | 131.4074 | 0.0800 | True | True | 1.9536 | 0.5357 | onset | True | 0.0300 | |
6.3226 | 154.0302 | 0.0846 | True | True | 2.6586 | 0.0401 | False | 0.1500 | ||
6.4726 | 160.8497 | 0.0449 | True | False | 2.0648 | 0.5671 | False | 0.0000 | ||
6.4726 | 160.8497 | 0.0449 | True | True | 2.0648 | 0.5671 | onset | True | 0.0464 | |
6.5190 | 154.0302 | 0.0336 | True | False | 1.4298 | 0.6445 | False | 0.0000 | ||
6.5190 | 154.0302 | 0.0336 | True | True | 1.4298 | 0.6445 | onset | True | 0.0697 | |
6.5887 | 141.2463 | 0.1048 | False | False | 1.0312 | 0.0000 | False | 0.0000 | ||
6.5887 | 160.8497 | 0.1048 | False | True | 1.0312 | 0.0000 | onset | True | 0.1335 | |
6.7222 | 170.4143 | 0.0606 | True | False | 0.1587 | 0.6729 | False | 0.0000 | ||
6.7222 | 170.4143 | 0.0606 | True | True | 0.1587 | 0.6729 | onset | False | 0.0639 | |
6.7860 | 158.5436 | 0.0633 | True | False | 1.7349 | 0.0800 | False | 0.0000 | ||
6.7860 | 158.5436 | 0.0633 | True | True | 1.7349 | 0.0800 | onset | True | 0.1103 | |
6.8963 | 135.2579 | 0.0863 | False | False | 2.6376 | 0.0000 | False | 0.0000 | ||
6.8963 | 147.4998 | 0.0863 | False | True | 2.6376 | 0.0000 | onset | True | 0.1330 | |
7.0293 | 163.1893 | 0.0878 | True | True | 0.5457 | 0.7880 | False | 0.1548 | ||
7.1841 | 205.6056 | 0.0134 | True | True | 0.4116 | 0.4224 | False | 0.0198 | ||
7.2040 | 191.2836 | 0.0076 | True | False | 1.0103 | 0.5191 | False | 0.0000 | ||
7.2040 | 191.2836 | 0.0076 | True | True | 1.0103 | 0.5191 | onset | False | 0.6162 | |
7.8202 | 191.2836 | 0.0010 | False | False | 0.4973 | 0.0000 | offset | False | 0.1094 | |
7.9296 | 158.5436 | 0.0230 | False | True | 1.5386 | 0.0000 | onset | True | 0.0755 | |
8.0051 | 156.2706 | 0.0193 | True | False | 0.4818 | 0.5666 | False | 0.0000 | ||
8.0051 | 156.2706 | 0.0193 | True | True | 0.4818 | 0.5666 | onset | False | 0.0813 | |
8.0863 | 131.4074 | 0.1218 | False | False | 5.3221 | 0.0000 | False | 0.0000 | ||
8.0863 | 163.1893 | 0.1218 | False | True | 5.3221 | 0.0000 | onset | True | 0.1706 | |
8.2570 | 143.3008 | 0.1423 | False | True | 1.2315 | 0.0079 | False | 0.0357 | ||
8.2927 | 172.8930 | 0.1197 | True | True | 0.7477 | 0.5757 | False | 0.1071 | ||
8.3998 | 170.4143 | 0.1363 | True | False | 0.7505 | 0.7437 | False | 0.0000 | ||
8.3998 | 170.4143 | 0.1363 | True | True | 0.7505 | 0.7437 | onset | False | 0.1804 | |
8.5802 | 158.5436 | 0.0878 | True | True | 0.2578 | 0.5662 | False | 0.0892 | ||
8.6695 | 183.1738 | 0.0902 | True | True | 0.4079 | 0.4511 | False | 0.1150 | ||
8.7844 | 191.2836 | 0.0552 | True | True | 0.4193 | 0.4455 | False | 0.2201 | ||
9.0045 | 183.1738 | 0.0313 | False | True | 1.0990 | 0.0116 | False | 0.0397 | ||
9.0442 | 160.8497 | 0.0272 | True | False | 2.2514 | 0.4280 | False | 0.0000 | ||
9.0442 | 160.8497 | 0.0272 | True | True | 2.2514 | 0.4280 | onset | True | 0.0900 | |
9.1341 | 139.2213 | 0.0744 | False | True | 1.1895 | 0.0158 | False | 0.0180 | ||
9.1521 | 160.8497 | 0.0583 | True | True | 0.7970 | 0.0981 | False | 0.0720 | ||
9.2241 | 141.2463 | 0.0113 | False | False | 1.6149 | 0.0919 | False | 0.0000 | ||
9.2241 | 0.0000 | 0.0113 | False | True | 1.6149 | 0.0919 | onset | True | 0.0580 | |
9.2822 | 0.0000 | 0.0639 | False | False | 1.9128 | 0.0000 | False | 0.0000 | ||
9.2822 | 167.9711 | 0.0639 | False | True | 1.9128 | 0.0000 | onset | True | 0.1189 | |
9.4011 | 158.5436 | 0.1219 | True | True | 0.4153 | 0.6361 | False | 0.1249 | ||
9.5260 | 165.5629 | 0.0869 | True | False | 2.7008 | 0.0919 | False | 0.0000 | ||
9.5260 | 165.5629 | 0.0869 | True | True | 2.7008 | 0.0919 | onset | True | 0.1053 | |
9.6313 | 194.0659 | 0.0198 | True | True | 0.5172 | 0.5055 | False | 0.1153 | ||
9.7466 | 194.0659 | 0.0744 | False | False | 5.5841 | 0.0000 | False | 0.0000 | ||
9.7466 | 177.9592 | 0.0744 | False | True | 5.5841 | 0.0000 | onset | True | 0.0522 | |
9.7988 | 175.4078 | 0.1098 | True | False | 1.5366 | 0.1470 | False | 0.0000 | ||
9.7988 | 175.4078 | 0.1098 | True | True | 1.5366 | 0.1470 | onset | True | 0.1335 | |
9.9323 | 160.8497 | 0.0660 | True | False | 0.9682 | 0.6589 | False | 0.0000 | ||
9.9323 | 160.8497 | 0.0660 | True | True | 0.9682 | 0.6589 | onset | False | 0.0614 | |
9.9938 | 145.3851 | 0.0477 | True | True | 0.2198 | 0.5126 | False | 0.1824 | ||
10.1761 | 139.2213 | 0.0071 | True | False | 1.6309 | 0.5569 | False | 0.0000 | ||
10.1761 | 139.2213 | 0.0071 | True | True | 1.6309 | 0.5569 | onset | True | 0.0871 | |
10.2632 | 133.3188 | 0.0062 | False | False | 3.5289 | 0.0000 | False | 0.0000 | ||
10.2632 | 0.0000 | 0.0062 | False | True | 3.5289 | 0.0000 | onset | True | 0.0987 | |
10.3619 | 0.0000 | 0.0865 | False | False | 4.7550 | 0.0000 | False | 0.0000 | ||
10.3619 | 175.4078 | 0.0865 | False | True | 4.7550 | 0.0000 | onset | True | 0.0417 | |
10.4036 | 165.5629 | 0.0888 | True | True | 1.4461 | 0.4936 | False | 0.1211 | ||
10.5247 | 194.0659 | 0.0080 | True | True | 0.3207 | 0.5469 | False | 0.1926 | ||
10.7173 | 185.8381 | 0.0010 | False | False | 0.2314 | 0.0000 | offset | False | 0.7186 | |
11.4358 | 133.3188 | 0.0926 | True | True | 3.4477 | 0.2550 | onset | True | 0.0302 | |
11.4660 | 158.5436 | 0.1021 | True | True | 0.4259 | 0.4335 | False | 0.1207 | ||
11.5868 | 160.8497 | 0.0963 | True | False | 2.0686 | 0.5882 | False | 0.0000 | ||
11.5868 | 160.8497 | 0.0963 | True | True | 2.0686 | 0.5882 | onset | True | 0.0929 | |
11.6796 | 158.5436 | 0.0520 | False | False | 2.1780 | 0.0875 | False | 0.0000 | ||
11.6796 | 0.0000 | 0.0520 | False | True | 2.1780 | 0.0875 | onset | True | 0.1219 | |
11.8015 | 0.0000 | 0.1026 | False | False | 0.7592 | 0.0000 | False | 0.0000 | ||
11.8015 | 194.0659 | 0.1026 | False | True | 0.7592 | 0.0000 | onset | False | 0.0464 | |
11.8480 | 188.5412 | 0.0543 | True | False | 1.4402 | 0.2057 | False | 0.0000 | ||
11.8480 | 188.5412 | 0.0543 | True | True | 1.4402 | 0.2057 | onset | True | 0.0481 | |
11.8961 | 185.8381 | 0.0345 | True | True | 0.6462 | 0.6121 | False | 0.0622 | ||
11.9583 | 139.2213 | 0.0656 | True | False | 4.2064 | 0.0282 | False | 0.0000 | ||
11.9583 | 139.2213 | 0.0656 | True | True | 4.2064 | 0.0282 | onset | True | 0.0100 | |
11.9682 | 158.5436 | 0.0712 | True | True | 3.3945 | 0.0100 | False | 0.1115 | ||
12.0798 | 165.5629 | 0.0401 | True | True | 1.2763 | 0.6301 | False | 0.0817 | ||
12.1615 | 151.8219 | 0.1227 | False | False | 2.8366 | 0.0000 | False | 0.0000 | ||
12.1615 | 167.9711 | 0.1227 | False | True | 2.8366 | 0.0000 | onset | True | 0.2612 | |
12.4227 | 170.4143 | 0.0518 | False | False | 6.2205 | 0.0000 | False | 0.0000 | ||
12.4227 | 156.2706 | 0.0518 | False | True | 6.2205 | 0.0000 | onset | True | 0.2089 | |
12.6315 | 143.3008 | 0.0479 | True | True | 0.7659 | 0.0206 | False | 0.0756 | ||
12.7071 | 147.4998 | 0.0423 | True | False | 1.3421 | 0.5800 | False | 0.0000 | ||
12.7071 | 147.4998 | 0.0423 | True | True | 1.3421 | 0.5800 | onset | True | 0.0639 | |
12.7710 | 135.2579 | 0.0285 | True | False | 1.4383 | 0.4920 | False | 0.0000 | ||
12.7710 | 135.2579 | 0.0285 | True | True | 1.4383 | 0.4920 | onset | True | 0.0987 | |
12.8697 | 129.5234 | 0.0680 | False | False | 1.3549 | 0.0000 | False | 0.0000 | ||
12.8697 | 172.8930 | 0.0680 | False | True | 1.3549 | 0.0000 | onset | True | 0.1277 | |
12.9974 | 133.3188 | 0.0859 | True | False | 0.1963 | 0.2329 | False | 0.0000 | ||
12.9974 | 133.3188 | 0.0859 | True | True | 0.1963 | 0.2329 | onset | False | 0.0378 | |
13.0351 | 154.0302 | 0.0821 | True | True | 2.0038 | 0.0100 | False | 0.1828 | ||
13.2180 | 137.2253 | 0.0111 | True | False | 0.8560 | 0.1032 | False | 0.0000 | ||
13.2180 | 137.2253 | 0.0111 | True | True | 0.8560 | 0.1032 | onset | False | 0.0678 | |
13.2858 | 120.5011 | 0.0029 | True | True | 0.5886 | 0.0913 | False | 0.1934 | ||
13.4792 | 131.4074 | 0.0567 | False | False | 4.6962 | 0.0000 | False | 0.0000 | ||
13.4792 | 163.1893 | 0.0567 | False | True | 4.6962 | 0.0000 | onset | True | 0.1393 | |
13.6185 | 156.2706 | 0.0226 | False | False | 4.8705 | 0.0000 | False | 0.0000 | ||
13.6185 | 151.8219 | 0.0226 | False | True | 4.8705 | 0.0000 | onset | True | 0.1103 | |
13.7288 | 131.4074 | 0.0329 | False | False | 1.9961 | 0.0000 | False | 0.0000 | ||
13.7288 | 139.2213 | 0.0329 | False | True | 1.9961 | 0.0000 | onset | True | 0.1277 | |
13.8565 | 139.2213 | 0.0307 | True | False | 1.2858 | 0.0101 | False | 0.0000 | ||
13.8565 | 139.2213 | 0.0307 | True | True | 1.2858 | 0.0101 | onset | True | 0.1071 | |
13.9636 | 95.6418 | 0.0137 | True | True | 0.2511 | 0.2286 | False | 0.0813 | ||
14.0449 | 97.0329 | 0.0295 | True | True | 0.2771 | 0.1815 | False | 0.0436 | ||
14.0885 | 113.7379 | 0.0256 | True | True | 0.7078 | 0.6753 | False | 0.1249 | ||
14.2134 | 97.0329 | 0.0040 | True | True | 0.3597 | 0.7748 | False | 0.1269 | ||
14.3404 | 102.8028 | 0.0014 | False | False | 0.1906 | 0.0000 | offset | False | 0.7526 | |
15.0930 | 160.8497 | 0.0485 | False | True | 0.9308 | 0.0000 | onset | False | 0.1954 | |
15.2883 | 145.3851 | 0.0756 | False | True | 3.0332 | 0.0031 | False | 0.0020 | ||
15.2903 | 163.1893 | 0.0754 | True | False | 2.2525 | 0.0100 | False | 0.0000 | ||
15.2903 | 163.1893 | 0.0754 | True | True | 2.2525 | 0.0100 | onset | True | 0.1451 | |
15.4355 | 141.2463 | 0.1522 | False | False | 4.3779 | 0.0000 | False | 0.0000 | ||
15.4355 | 199.7524 | 0.1522 | False | True | 4.3779 | 0.0000 | onset | True | 0.0773 | |
15.5128 | 202.6579 | 0.0811 | True | True | 0.4225 | 0.4576 | False | 0.1229 | ||
15.6357 | 154.0302 | 0.0699 | True | True | 0.4427 | 0.5781 | False | 0.1566 | ||
15.7924 | 194.0659 | 0.0038 | True | True | 0.3769 | 0.6508 | False | 0.1249 | ||
15.9173 | 180.5477 | 0.0014 | False | False | 2.4315 | 0.0000 | False | 0.0000 | ||
15.9173 | 0.0000 | 0.0014 | False | True | 2.4315 | 0.0000 | onset | True | 0.4228 | |
16.3401 | 0.0000 | 0.0010 | False | False | 0.1324 | 0.0000 | offset | False | 0.1055 | |
16.4455 | 158.5436 | 0.1093 | False | True | 3.4108 | 0.0000 | onset | True | 0.0986 | |
16.5441 | 147.4998 | 0.1001 | True | True | 0.6687 | 0.4285 | False | 0.0523 | ||
16.5965 | 163.1893 | 0.1416 | True | False | 1.9066 | 0.7165 | False | 0.0000 | ||
16.5965 | 163.1893 | 0.1416 | True | True | 1.9066 | 0.7165 | onset | True | 0.0871 | |
16.6835 | 205.6056 | 0.0257 | True | False | 1.5198 | 0.5117 | False | 0.0000 | ||
16.6835 | 205.6056 | 0.0257 | True | True | 1.5198 | 0.5117 | onset | True | 0.0580 | |
16.7416 | 205.6056 | 0.0482 | True | False | 0.1604 | 0.4567 | False | 0.0000 | ||
16.7416 | 205.6056 | 0.0482 | True | True | 0.1604 | 0.4567 | onset | False | 0.3541 | |
17.0957 | 205.6056 | 0.1146 | False | False | 3.0281 | 0.0115 | False | 0.0000 | ||
17.0957 | 165.5629 | 0.1146 | False | True | 3.0281 | 0.0115 | onset | True | 0.0834 | |
17.1791 | 154.0302 | 0.0881 | True | True | 0.1235 | 0.7391 | False | 0.1430 | ||
17.3221 | 160.8497 | 0.0149 | True | False | 2.7342 | 0.6961 | False | 0.0000 | ||
17.3221 | 160.8497 | 0.0149 | True | True | 2.7342 | 0.6961 | onset | True | 0.0580 | |
17.3801 | 170.4143 | 0.0397 | True | False | 1.8166 | 0.2023 | False | 0.0000 | ||
17.3801 | 170.4143 | 0.0397 | True | True | 1.8166 | 0.2023 | onset | True | 0.2496 | |
17.6298 | 170.4143 | 0.0433 | False | False | 2.7046 | 0.0000 | False | 0.0000 | ||
17.6298 | 167.9711 | 0.0433 | False | True | 2.7046 | 0.0000 | onset | True | 0.1103 | |
17.7400 | 163.1893 | 0.0909 | True | False | 1.9407 | 0.0721 | False | 0.0000 | ||
17.7400 | 163.1893 | 0.0909 | True | True | 1.9407 | 0.0721 | onset | True | 0.1031 | |
17.8431 | 149.6452 | 0.0154 | True | True | 0.4730 | 0.6143 | False | 0.1388 | ||
17.9819 | 149.6452 | 0.0974 | False | True | 3.2106 | 0.0032 | False | 0.0020 | ||
17.9839 | 170.4143 | 0.0981 | True | False | 2.3812 | 0.0100 | False | 0.0000 | ||
17.9839 | 170.4143 | 0.0981 | True | True | 2.3812 | 0.0100 | onset | True | 0.2557 | |
18.2396 | 175.4078 | 0.0690 | True | True | 0.3838 | 0.5010 | False | 0.1506 | ||
18.3902 | 151.8219 | 0.0324 | True | False | 1.4884 | 0.8043 | False | 0.0000 | ||
18.3902 | 151.8219 | 0.0324 | True | True | 1.4884 | 0.8043 | onset | True | 0.0813 | |
18.4715 | 151.8219 | 0.0676 | True | False | 0.6535 | 0.1022 | False | 0.0000 | ||
18.4715 | 151.8219 | 0.0676 | True | True | 0.6535 | 0.1022 | onset | False | 0.0987 | |
18.5702 | 156.2706 | 0.0422 | True | False | 2.0092 | 0.6058 | False | 0.0000 | ||
18.5702 | 156.2706 | 0.0422 | True | True | 2.0092 | 0.6058 | onset | True | 0.1741 | |
18.7443 | 158.5436 | 0.0527 | True | False | 1.2417 | 0.4150 | False | 0.0000 | ||
18.7443 | 158.5436 | 0.0527 | True | True | 1.2417 | 0.4150 | onset | True | 0.1378 | |
18.8821 | 151.8219 | 0.0496 | False | True | 0.9473 | 0.0047 | False | 0.0020 | ||
18.8841 | 221.0000 | 0.0457 | True | True | 1.0064 | 0.0100 | False | 0.1098 | ||
18.9939 | 208.5962 | 0.0078 | False | False | 5.7084 | 0.0000 | False | 0.0000 | ||
18.9939 | 205.6056 | 0.0078 | False | True | 5.7084 | 0.0000 | onset | True | 0.0929 | |
19.0868 | 208.5962 | 0.0637 | False | False | 4.8244 | 0.0000 | False | 0.0000 | ||
19.0868 | 160.8497 | 0.0637 | False | True | 4.8244 | 0.0000 | onset | True | 0.1191 | |
19.2059 | 133.3188 | 0.0469 | True | True | 0.3761 | 0.3766 | False | 0.1131 | ||
19.3190 | 133.3188 | 0.0499 | True | False | 0.5350 | 0.5701 | False | 0.0000 | ||
19.3190 | 133.3188 | 0.0499 | True | True | 0.5350 | 0.5701 | onset | False | 0.1509 | |
19.4699 | 131.4074 | 0.0480 | True | False | 1.8313 | 0.3762 | False | 0.0000 | ||
19.4699 | 131.4074 | 0.0480 | True | True | 1.8313 | 0.3762 | onset | True | 0.1404 | |
19.6104 | 125.8361 | 0.0730 | True | True | 0.3846 | 0.0949 | False | 0.1068 | ||
19.7172 | 183.1738 | 0.0180 | True | True | 0.5501 | 0.4861 | False | 0.2828 | ||
20.0000 | 194.0659 | 0.0011 | False | False | 0.4973 | 0.0000 | False | 0.0000 |
The transcription itself transforms each group of breakpoints into a series of notes. By default groups are enclosed within slurs and accents mark particularly salient onsets. A ‘x’ notehead indicates that at the given moment the sound was noisy / unpitched.
[56]:
options = transcribe.TranscriptionOptions(unvoicedPitch='4G', addSlurs=True, debug=False, addGliss=True, addAccents=True)
v = analysis.transcribe(options=options)
v
[maelzel.scoring:notation.py:536:addSpanner:WARNING] A Notation cannot be assigned both start and end of a spanner. Removing the partner spannerself=«3D+ 1.625:1.75 1/8♩ noteheads=['0:cross'] attachments=[Articulation(kind=accent)] spanners=[Slur(kind=start, linetype=solid, nestingLevel=1, uuid=rdmciucp)]», spanner=Slur(kind=end, linetype=solid, nestingLevel=1, uuid=rdmciucp), partner=Slur(kind=start, linetype=solid, nestingLevel=1, uuid=rdmciucp), end=None
[maelzel.scoring:renderlily.py:583:_handleSpannerPost:ERROR] Two many nested slurs: Slur(kind=start, linetype=solid, nestingLevel=5, parent=«3E>gliss 0.9:1 1/10♩ 5/4 noteheads=['0:cross'] attachments=[Articulation(kind=accent)] spanners=[Slur(kind=start, linetype=solid, nestingLevel=5, parent=«3E>gliss 0.9:1 1/10♩ 5/4 noteheads=['0:cross'] attachments=[Articulation(kind=accent)] spanners=[...]», uuid=kda2ogjp)]», uuid=kda2ogjp), skipping
[maelzel.scoring:renderlily.py:583:_handleSpannerPost:ERROR] Two many nested slurs: Slur(kind=start, linetype=solid, nestingLevel=4, parent=«3Eb+25gliss 0.833:0.917 1/12♩ 3/2 attachments=[Articulation(kind=accent)] spanners=[Slur(kind=start, linetype=solid, nestingLevel=4, parent=«3Eb+25gliss 0.833:0.917 1/12♩ 3/2 attachments=[Articulation(kind=accent)] spanners=[...]», uuid=i3nqfv21)]», uuid=i3nqfv21), skipping
[maelzel.scoring:renderlily.py:583:_handleSpannerPost:ERROR] Two many nested slurs: Slur(kind=start, linetype=solid, nestingLevel=3, parent=«3E-gliss 1.917:2 1/12♩ 3/2 spanners=[Slur(kind=start, linetype=solid, nestingLevel=3, parent=«3E-gliss 1.917:2 1/12♩ 3/2 spanners=[...]», uuid=9oqjjj3l)]», uuid=9oqjjj3l), skipping
[56]:
Voice([3Eb<:-58dB:0.313♩:offset=0.395:symbols=[Notehead(shape=cross)], 3F<:-19dB:0.163♩:offset=0.708:symbols=[Notehead(shape=cross), Articulation(kind=accent)], 3E>:-20dB:0.126♩:offset=0.871:gliss=True:symbols=[Notehead(shape=cross), Articulation(kind=accent), Slur(anchor=Slur, kind=start, linetype=solid, partnerSpanner=Slur, uuid=kda2ogjp)], 3F:-20dB:0.06♩:offset=0.997:symbols=[Slur(anchor=Slur, kind=end, linetype=solid, partnerSpanner=Slur, uuid=kda2ogjp)], 3E<:-19dB:0.186♩:offset=1.057, 3E:-20dB:0.179♩:offset=1.242:gliss=True:symbols=[Articulation(kind=accent), Slur(anchor=Slur, kind=start, linetype=solid, partnerSpanner=Slur, uuid=zlc78b5i)], 3F<:-41dB:0.036♩:offset=1.422:gliss=True, 3D+:-53dB:0.046♩:offset=1.458:symbols=[Slur(anchor=Slur, kind=end, linetype=solid, partnerSpanner=Slur, uuid=zlc78b5i)], 4G:-57dB:0.232♩:offset=1.503:symbols=[Notehead(shape=cross), Articulation(kind=accent)], 3D:-32dB:0.108♩:offset=1.736:gliss=True:symbols=[Notehead(shape=cross), Slur(anchor=Slur, kind=start, linetype=solid, partnerSpanner=Slur, uuid=7108uupy)], …], dur=20, offset=0)
2.3. Playback¶
Below is a simple approach to sonify the transcription by defining different presets for sound which are fully voiced, partially voiced or completely unpitched. With more accurate feature detection regarding unpitched sections and formant prediction it might be possible to produce a much more accurate result
[61]:
defPreset('unpitched', r"""
|icentroid, iq=20|
asig pinker
asig *= kamp
iband = icentroid / iq
aout1 resonr asig, icentroid, iband
""")
defPreset('unvoiced', r"""
|icentroid, iq=20|
asig vco2 1, kfreq
anoise = pinker() * 0.1
iband = icentroid / iq
anoise resonr anoise, icentroid, iband
aout1 = (asig + anoise) * kamp
""");
[69]:
for n in v.items:
if n.playargs:
n.playargs._checkArgs()
if n.getProperty('voiced'):
n.setPlay(instr='saw', args={'kcutoffratio': 8})
elif n.getProperty('unvoicedGroup'):
n.setPlay(instr='unpitched', args={'icentroid': n.getProperty('centroid'), 'iq': 10}, gain=1, fade=0)
else:
n.setPlay(instr='unvoiced', args={'icentroid': n.getProperty('centroid'), 'iq': 40}, gain=1)
We load the original sound into a Clip
in order to easily play it in sync with the analyisis
[63]:
cl = Clip(s.path)
mnOut size: 22
m_pitchTrack size: 22
mnOut size: 22
m_pitchTrack size: 22
mnOut size: 22
m_pitchTrack size: 22
mnOut size: 22
m_pitchTrack size: 22
Normally one would call play
as in the cell below. In order for the generated audio to be playable online we render the result to disk
[75]:
play(
v.events(position=0, gain=2, sustain=0.05, fade=(0.01, 0.05)),
cl.events(position=1, delay=0.)
)
[75]:
85
)
Instr: preset:unvoiced - 31 synths
p1 | start | dur | p4 | 5:kpos | 6:kgain | 7:idataidx_ | 8:inumbps | 9:ibplen | 10:ichan | 11:ifadein | ... |
---|---|---|---|---|---|---|---|---|---|---|---|
502.0451 𝍪 | 0.426 | 0.313 | 0 | 0 | 1 | 17 | 2 | 3 | 1 | ... | |
502.0452 𝍪 | 0.739 | 0.163 | 0 | 0 | 1 | 17 | 2 | 3 | 1 | ... | |
502.0453 𝍪 | 0.902 | 0.236 | 0 | 0 | 1 | 17 | 4 | 3 | 1 | ... | |
502.0454 𝍪 | 1.767 | 0.178 | 0 | 0 | 1 | 17 | 4 | 3 | 1 | ... | |
... |
Instr: preset:saw - 45 synths
p1 | start | dur | p4 | 5:kpos | 6:kgain | 7:idataidx_ | 8:inumbps | 9:ibplen | 10:ichan | 11:ifadein | 12:ifadeout | 13:ipchintrp_ | 14:ifadekind | 15:ktransp | 16:klag | 17:kcutoffratio | 18:kfilterq | 19 | 20 | 21 | 22 | 23 | ... |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
503.0869 𝍪 | 1.087 | 0.186 | 0 | 0 | 2 | 19 | 2 | 3 | 1 | 0.01 | 0.05 | 0 | 1 | 0 | 0.1 | 8 | 3 | 0 | 51.75 | 0.10966 | 0.18576 | ... | |
503.087 𝍪 | 1.273 | 0.311 | 0 | 0 | 2 | 19 | 5 | 3 | 1 | 0.01 | 0.05 | 0 | 1 | 0 | 0.1 | 8 | 3 | 0 | 52 | 0.099954 | 0.17947 | ... | |
503.0871 𝍪 | 1.894 | 0.186 | 0 | 0 | 2 | 19 | 2 | 3 | 1 | 0.01 | 0.05 | 0 | 1 | 0 | 0.1 | 8 | 3 | 0 | 51.25 | 0.068307 | 0.18576 | ... | |
503.0872 𝍪 | 4.158 | 0.209 | 0 | 0 | 2 | 19 | 2 | 3 | 1 | 0.01 | 0.05 | 0 | 1 | 0 | 0.1 | 8 | 3 | 0 | 52.75 | 0.10933 | 0.20898 | ... | |
... |
Instr: preset:unpitched - 8 synths
p1 | start | dur | p4 | 5:kpos | 6:kgain | 7:idataidx_ | 8:inumbps | 9:ibplen | 10:ichan | 11:ifadein | ... |
---|---|---|---|---|---|---|---|---|---|---|---|
501.0118 𝍪 | 1.534 | 0.282 | 0 | 0 | 1 | 17 | 3 | 3 | 1 | ... | |
501.0119 𝍪 | 2.498 | 0.112 | 0 | 0 | 1 | 17 | 3 | 3 | 1 | ... | |
501.012 𝍪 | 2.724 | 0.137 | 0 | 0 | 1 | 17 | 3 | 3 | 1 | ... | |
501.0121 𝍪 | 2.811 | 0.587 | 0 | 0 | 1 | 17 | 3 | 3 | 1 | ... | |
... |
Instr: preset:_clip_diskin - 1 synths
p1 | start | dur | p4 | 5:kpos | 6:kgain | 7:idataidx_ | 8:inumbps | 9:ibplen | 10:ichan | 11:ifadein | 12:ifadeout | 13:ipchintrp_ | 14:ifadekind | 15:ipath | 16:isndfilechan | 17:kspeed | 18:iskip | 19:iwrap | 20:iwinsize | 21 | 22 | ... |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
504.0016 𝍪 | 0.031 | 20.000 | 0 | 1 | 1 | 21 | 2 | 3 | 1 | 0.02 | 0.02 | 0 | 1 | snd/istambul2.flac | -1 | 1 | 0 | 0 | 4 | 0 | ... |
[24]:
render('assets/transcription.ogg', [
v.events(position=0, gain=2, sustain=0.05, fade=(0.0, 0.05)),
cl.events(position=1, delay=0.)
])
[24]:
"assets/transcription.ogg"
, 2
channels, 20.10
secs, 44100
Hz)[ ]: