2. Transcription

Transcription of audio into a musical representation is not a completely objective process. The kind of transcription and the methods used depend heavily on the purpose of the transcription, on what it will be used for. Within maelzel there is a transcribe package which implements multiple transcription strategies

2.1. Voice Analysis / Transcription

When transcribing a human voice, which is a monophonic source with highly harmonic timbre for the pitched parts of speech/song, probably the most appropriate transcription method is based on the analysis of the fundamental frequency in combination with onset/offset prediction and other secondary features.

maelzel.transcribe.FundamentalAnalysisMonophonic implements the skeleton of such an approach:

  1. Onset detection

  2. The fundamental is sampled within each onset-offset timespan to include any pitch inflections.

  3. A list of “gestures” is generated, where each gestures consists on a series of breakpoints. A breakpoint has information regarding pitch, amplitude, voicedness (how much pitch content the voice has) and other features at a given time

  4. Unpitched sections are analyzed using secondary features, like centroid, to characterize them with more detail.

TODO: it might be desirable to extend the analysis of the unpitched sections using more detailed features like mfb or mfcc. For voice transcription it would also be relevant to perform formant analysis and vowel/phoneme prediction to enrich the transcription

[1]:
from maelzel.snd.audiosample import Sample
from maelzel import transcribe
from maelzel.core import *
import matplotlib.pyplot as plt

2.2. Configuration

When performing automatic transcription the quantization results tend to become very complicated. For this reason it is important to limit the complexity allowed by the quantization algorithm in order to keep the results readable. As it will be seen later the resulting transcription remains very close to the original, even with low complexity quantization.

[2]:
cfg = getConfig()
cfg['quant.complexity'] = 'low'
cfg['show.centsDeviationAsTextAnnotation'] = False   # Disable text annotations for cents deviation for each note
cfg['show.horizontalSpacing'] = 'large'              # More reabable since every note probably has its own accidental
cfg['show.lilypondGlissandoMinimumLength'] = 3       # Makes sure that glissando lines are always shown
[3]:
# s = Sample('../snd/Numbers_EnglishFemale.flac')
# s = Sample('../snd/colours-german-male.flac')
# s = Sample("~/tmp/gliss.flac")
# s = Sample('../snd/finneganswake-fragm01.flac')

# s = Sample('../snd/voiceover-fragment-48k.flac')
s = Sample('snd/istambul2.flac')


# Only the left channel
s0 = s.getChannel(0, contiguous=True)
s0
[3]:
Sample(duration=20, sr=44100, numchannels=1)

Now we perform the analysis itself. There are many parameters which can be customized but in general the defaults tend to produce viable results. The analysis is then plotted to show utterances split by onsets. Within each onset-offset range the predicted (and already simplified) fundamental is shown as a line. Zero-frequency sections represent onsets for where the confidence of the fundamental pitch prediction was too low.

[52]:
analysis = transcribe.FundamentalAnalysisMonophonic(s0.samples,
                                              sr=s0.sr,
                                              # Quantize the pitch to its nearest 1/8th tone
                                              semitoneQuantization=4,
                                              minSilence=0.001,
                                              # Mark which are the most salient onsets
                                              accentPercentile=0.1,
                                              # Simplify the pitch contour
                                              simplify=0.08,
                                              # Default: 0.07, lower for more onset sensitivity
                                              onsetThreshold=0.05,
                                              # Unvoiced utterances softer than this are not taken into
                                              # consideration since they are often oart of the
                                              # background noise
                                              unvoicedMinAmplitudePercentile=0.3,
                                              overlap=16,
                                              onsetOverlap=8,
                                              onsetBacktrack=True)
plt.figure(figsize=(20, 6))
analysis.plot(spanAlpha=0.2)

mnOut size: 6891
m_pitchTrack size: 6891
../_images/notebooks_demo-transcribe_6_1.png

The analysis contains a series of breakpoints.These breakpoints are grouped by onset/offset region and stored in the .groups attribute within the analysis. A group contains all the breakpoints between an onset and the next onset. Notice that a group can have an offset or not. A group without an offset indicates that there is no interruption between the end of one group and the start of the next. Keep in mind that onset prediction is performed based on spectral flow: an onset indicates a significant change of spectrum within a given period of time.

[27]:
analysis
[27]:
time freq ampvoiced linked onsetStrength freqConfidencekind isaccent durationproperties
0.3947154.03020.0012False True 0.3751 0.0000onset False 0.3135
0.7082151.82190.1181False False 8.0438 0.0000 False 0.0000
0.7082172.89300.1181False True 8.0438 0.0000onset True 0.1625
0.8707139.22130.1047False False 2.4267 0.0000 False 0.0000
0.8707167.97110.1047False True 2.4267 0.0000onset True 0.1258
0.9966175.40780.0946True True 0.4189 0.6834 False 0.0599
1.0565163.18930.1097True False 0.8343 0.7343 False 0.0000
1.0565163.18930.1097True True 0.8343 0.7343onset False 0.1858
1.2423165.56290.1000True False 1.1559 0.5815 False 0.0000
1.2423165.56290.1000True True 1.1559 0.5815onset True 0.2612
1.5035167.97110.0013False False 1.7849 0.0000 False 0.0000
1.5035 0.00000.0013False True 1.7849 0.0000onset True 0.2322
1.7357 0.00000.0239False False 0.3449 0.0000 False 0.0000
1.7357147.49980.0239False True 0.3449 0.0000onset False 0.1078
1.8434129.52340.0856True True 0.3591 1.0000 False 0.0200
1.8634158.54360.0683True False 4.5178 0.0176 False 0.0000
1.8634158.54360.0683True True 4.5178 0.0176onset True 0.1858
2.0492156.27060.0302False False 3.3984 0.0000 False 0.0000
2.0492129.52340.0302False True 3.3984 0.0000onset True 0.0734
2.1226129.52340.0288True True 0.6460 0.0413 False 0.0734
2.1960101.32890.0152True True 0.3784 0.4973 False 0.2618
2.4578 99.87620.0009False False 0.8554 0.0000offsetFalse 0.0093
2.4671 0.00000.0009False True 0.5128 0.0000onset False 0.0619
2.5290 0.00000.0012False False 0.2495 0.0000offsetFalse 0.1645
2.6935 0.00000.0014False True 1.2353 0.0000onset True 0.0871
2.7806 0.00000.0013False False 1.8610 0.0000 False 0.0000
2.7806 0.00000.0013False True 1.8610 0.0000onset True 0.5367
3.3173 0.00000.0011False False 0.2848 0.0000offsetFalse 0.4675
3.7849165.56290.0921False True 3.2361 0.0000onset True 0.1741
3.9590131.40740.1086False False 3.1797 0.0000 False 0.0000
3.9590165.56290.1086False True 3.1797 0.0000onset True 0.1683
4.1273172.89300.1093True False 0.4700 0.5129 False 0.0000
4.1273172.89300.1093True True 0.4700 0.5129onset False 0.2090
4.3363165.56290.0966True False 1.3259 0.5701 False 0.0000
4.3363165.56290.0966True True 1.3259 0.5701onset True 0.0578
4.3941177.95920.0943True True 0.7116 0.4555 False 0.2151
4.6092172.89300.0084True False 0.5496 0.7848 False 0.0000
4.6092172.89300.0084True True 0.5496 0.7848onset False 0.0397
4.6489147.49980.0027True True 0.4243 0.3823 False 0.1847
4.8336147.49980.0476False True 1.1359 0.0202 False 0.0020
4.8356158.54360.0479True False 1.1429 0.0639 False 0.0000
4.8356158.54360.0479True True 1.1429 0.0639onset True 0.0773
4.9129149.64520.1306True True 1.2890 0.5596 False 0.0754
4.9883172.89300.0893True True 0.9242 0.6797 False 0.1666
5.1548137.22530.0983False False 2.9877 0.0111 False 0.0000
5.1548160.84970.0983False True 2.9877 0.0111onset True 0.0813
5.2361163.18930.0440True False 2.0383 0.6287 False 0.0000
5.2361163.18930.0440True True 2.0383 0.6287onset True 0.1509
5.3870156.27060.1171False False 0.8719 0.0000 False 0.0000
5.3870194.06590.1171False True 0.8719 0.0000onset False 0.0773
5.4643191.28360.0657True True 0.5144 0.4880 False 0.1486
5.6129156.27060.0161True True 0.5846 0.1090 False 0.1863
5.7992156.27060.0653False False 2.8919 0.0000 False 0.0000
5.7992163.18930.0653False True 2.8919 0.0000onset True 0.1270
5.9262145.38510.0830True True 0.7192 0.5901 False 0.0933
6.0194167.97110.0631True True 0.6940 0.5892 False 0.1687
6.1881158.54360.0177True False 0.5140 0.6107 False 0.0000
6.1881158.54360.0177True True 0.5140 0.6107onset False 0.1045
6.2926131.40740.0800True False 1.9536 0.5357 False 0.0000
6.2926131.40740.0800True True 1.9536 0.5357onset True 0.0300
6.3226154.03020.0846True True 2.6586 0.0401 False 0.1500
6.4726160.84970.0449True False 2.0648 0.5671 False 0.0000
6.4726160.84970.0449True True 2.0648 0.5671onset True 0.0464
6.5190154.03020.0336True False 1.4298 0.6445 False 0.0000
6.5190154.03020.0336True True 1.4298 0.6445onset True 0.0697
6.5887141.24630.1048False False 1.0312 0.0000 False 0.0000
6.5887160.84970.1048False True 1.0312 0.0000onset True 0.1335
6.7222170.41430.0606True False 0.1587 0.6729 False 0.0000
6.7222170.41430.0606True True 0.1587 0.6729onset False 0.0639
6.7860158.54360.0633True False 1.7349 0.0800 False 0.0000
6.7860158.54360.0633True True 1.7349 0.0800onset True 0.1103
6.8963135.25790.0863False False 2.6376 0.0000 False 0.0000
6.8963147.49980.0863False True 2.6376 0.0000onset True 0.1330
7.0293163.18930.0878True True 0.5457 0.7880 False 0.1548
7.1841205.60560.0134True True 0.4116 0.4224 False 0.0198
7.2040191.28360.0076True False 1.0103 0.5191 False 0.0000
7.2040191.28360.0076True True 1.0103 0.5191onset False 0.6162
7.8202191.28360.0010False False 0.4973 0.0000offsetFalse 0.1094
7.9296158.54360.0230False True 1.5386 0.0000onset True 0.0755
8.0051156.27060.0193True False 0.4818 0.5666 False 0.0000
8.0051156.27060.0193True True 0.4818 0.5666onset False 0.0813
8.0863131.40740.1218False False 5.3221 0.0000 False 0.0000
8.0863163.18930.1218False True 5.3221 0.0000onset True 0.1706
8.2570143.30080.1423False True 1.2315 0.0079 False 0.0357
8.2927172.89300.1197True True 0.7477 0.5757 False 0.1071
8.3998170.41430.1363True False 0.7505 0.7437 False 0.0000
8.3998170.41430.1363True True 0.7505 0.7437onset False 0.1804
8.5802158.54360.0878True True 0.2578 0.5662 False 0.0892
8.6695183.17380.0902True True 0.4079 0.4511 False 0.1150
8.7844191.28360.0552True True 0.4193 0.4455 False 0.2201
9.0045183.17380.0313False True 1.0990 0.0116 False 0.0397
9.0442160.84970.0272True False 2.2514 0.4280 False 0.0000
9.0442160.84970.0272True True 2.2514 0.4280onset True 0.0900
9.1341139.22130.0744False True 1.1895 0.0158 False 0.0180
9.1521160.84970.0583True True 0.7970 0.0981 False 0.0720
9.2241141.24630.0113False False 1.6149 0.0919 False 0.0000
9.2241 0.00000.0113False True 1.6149 0.0919onset True 0.0580
9.2822 0.00000.0639False False 1.9128 0.0000 False 0.0000
9.2822167.97110.0639False True 1.9128 0.0000onset True 0.1189
9.4011158.54360.1219True True 0.4153 0.6361 False 0.1249
9.5260165.56290.0869True False 2.7008 0.0919 False 0.0000
9.5260165.56290.0869True True 2.7008 0.0919onset True 0.1053
9.6313194.06590.0198True True 0.5172 0.5055 False 0.1153
9.7466194.06590.0744False False 5.5841 0.0000 False 0.0000
9.7466177.95920.0744False True 5.5841 0.0000onset True 0.0522
9.7988175.40780.1098True False 1.5366 0.1470 False 0.0000
9.7988175.40780.1098True True 1.5366 0.1470onset True 0.1335
9.9323160.84970.0660True False 0.9682 0.6589 False 0.0000
9.9323160.84970.0660True True 0.9682 0.6589onset False 0.0614
9.9938145.38510.0477True True 0.2198 0.5126 False 0.1824
10.1761139.22130.0071True False 1.6309 0.5569 False 0.0000
10.1761139.22130.0071True True 1.6309 0.5569onset True 0.0871
10.2632133.31880.0062False False 3.5289 0.0000 False 0.0000
10.2632 0.00000.0062False True 3.5289 0.0000onset True 0.0987
10.3619 0.00000.0865False False 4.7550 0.0000 False 0.0000
10.3619175.40780.0865False True 4.7550 0.0000onset True 0.0417
10.4036165.56290.0888True True 1.4461 0.4936 False 0.1211
10.5247194.06590.0080True True 0.3207 0.5469 False 0.1926
10.7173185.83810.0010False False 0.2314 0.0000offsetFalse 0.7186
11.4358133.31880.0926True True 3.4477 0.2550onset True 0.0302
11.4660158.54360.1021True True 0.4259 0.4335 False 0.1207
11.5868160.84970.0963True False 2.0686 0.5882 False 0.0000
11.5868160.84970.0963True True 2.0686 0.5882onset True 0.0929
11.6796158.54360.0520False False 2.1780 0.0875 False 0.0000
11.6796 0.00000.0520False True 2.1780 0.0875onset True 0.1219
11.8015 0.00000.1026False False 0.7592 0.0000 False 0.0000
11.8015194.06590.1026False True 0.7592 0.0000onset False 0.0464
11.8480188.54120.0543True False 1.4402 0.2057 False 0.0000
11.8480188.54120.0543True True 1.4402 0.2057onset True 0.0481
11.8961185.83810.0345True True 0.6462 0.6121 False 0.0622
11.9583139.22130.0656True False 4.2064 0.0282 False 0.0000
11.9583139.22130.0656True True 4.2064 0.0282onset True 0.0100
11.9682158.54360.0712True True 3.3945 0.0100 False 0.1115
12.0798165.56290.0401True True 1.2763 0.6301 False 0.0817
12.1615151.82190.1227False False 2.8366 0.0000 False 0.0000
12.1615167.97110.1227False True 2.8366 0.0000onset True 0.2612
12.4227170.41430.0518False False 6.2205 0.0000 False 0.0000
12.4227156.27060.0518False True 6.2205 0.0000onset True 0.2089
12.6315143.30080.0479True True 0.7659 0.0206 False 0.0756
12.7071147.49980.0423True False 1.3421 0.5800 False 0.0000
12.7071147.49980.0423True True 1.3421 0.5800onset True 0.0639
12.7710135.25790.0285True False 1.4383 0.4920 False 0.0000
12.7710135.25790.0285True True 1.4383 0.4920onset True 0.0987
12.8697129.52340.0680False False 1.3549 0.0000 False 0.0000
12.8697172.89300.0680False True 1.3549 0.0000onset True 0.1277
12.9974133.31880.0859True False 0.1963 0.2329 False 0.0000
12.9974133.31880.0859True True 0.1963 0.2329onset False 0.0378
13.0351154.03020.0821True True 2.0038 0.0100 False 0.1828
13.2180137.22530.0111True False 0.8560 0.1032 False 0.0000
13.2180137.22530.0111True True 0.8560 0.1032onset False 0.0678
13.2858120.50110.0029True True 0.5886 0.0913 False 0.1934
13.4792131.40740.0567False False 4.6962 0.0000 False 0.0000
13.4792163.18930.0567False True 4.6962 0.0000onset True 0.1393
13.6185156.27060.0226False False 4.8705 0.0000 False 0.0000
13.6185151.82190.0226False True 4.8705 0.0000onset True 0.1103
13.7288131.40740.0329False False 1.9961 0.0000 False 0.0000
13.7288139.22130.0329False True 1.9961 0.0000onset True 0.1277
13.8565139.22130.0307True False 1.2858 0.0101 False 0.0000
13.8565139.22130.0307True True 1.2858 0.0101onset True 0.1071
13.9636 95.64180.0137True True 0.2511 0.2286 False 0.0813
14.0449 97.03290.0295True True 0.2771 0.1815 False 0.0436
14.0885113.73790.0256True True 0.7078 0.6753 False 0.1249
14.2134 97.03290.0040True True 0.3597 0.7748 False 0.1269
14.3404102.80280.0014False False 0.1906 0.0000offsetFalse 0.7526
15.0930160.84970.0485False True 0.9308 0.0000onset False 0.1954
15.2883145.38510.0756False True 3.0332 0.0031 False 0.0020
15.2903163.18930.0754True False 2.2525 0.0100 False 0.0000
15.2903163.18930.0754True True 2.2525 0.0100onset True 0.1451
15.4355141.24630.1522False False 4.3779 0.0000 False 0.0000
15.4355199.75240.1522False True 4.3779 0.0000onset True 0.0773
15.5128202.65790.0811True True 0.4225 0.4576 False 0.1229
15.6357154.03020.0699True True 0.4427 0.5781 False 0.1566
15.7924194.06590.0038True True 0.3769 0.6508 False 0.1249
15.9173180.54770.0014False False 2.4315 0.0000 False 0.0000
15.9173 0.00000.0014False True 2.4315 0.0000onset True 0.4228
16.3401 0.00000.0010False False 0.1324 0.0000offsetFalse 0.1055
16.4455158.54360.1093False True 3.4108 0.0000onset True 0.0986
16.5441147.49980.1001True True 0.6687 0.4285 False 0.0523
16.5965163.18930.1416True False 1.9066 0.7165 False 0.0000
16.5965163.18930.1416True True 1.9066 0.7165onset True 0.0871
16.6835205.60560.0257True False 1.5198 0.5117 False 0.0000
16.6835205.60560.0257True True 1.5198 0.5117onset True 0.0580
16.7416205.60560.0482True False 0.1604 0.4567 False 0.0000
16.7416205.60560.0482True True 0.1604 0.4567onset False 0.3541
17.0957205.60560.1146False False 3.0281 0.0115 False 0.0000
17.0957165.56290.1146False True 3.0281 0.0115onset True 0.0834
17.1791154.03020.0881True True 0.1235 0.7391 False 0.1430
17.3221160.84970.0149True False 2.7342 0.6961 False 0.0000
17.3221160.84970.0149True True 2.7342 0.6961onset True 0.0580
17.3801170.41430.0397True False 1.8166 0.2023 False 0.0000
17.3801170.41430.0397True True 1.8166 0.2023onset True 0.2496
17.6298170.41430.0433False False 2.7046 0.0000 False 0.0000
17.6298167.97110.0433False True 2.7046 0.0000onset True 0.1103
17.7400163.18930.0909True False 1.9407 0.0721 False 0.0000
17.7400163.18930.0909True True 1.9407 0.0721onset True 0.1031
17.8431149.64520.0154True True 0.4730 0.6143 False 0.1388
17.9819149.64520.0974False True 3.2106 0.0032 False 0.0020
17.9839170.41430.0981True False 2.3812 0.0100 False 0.0000
17.9839170.41430.0981True True 2.3812 0.0100onset True 0.2557
18.2396175.40780.0690True True 0.3838 0.5010 False 0.1506
18.3902151.82190.0324True False 1.4884 0.8043 False 0.0000
18.3902151.82190.0324True True 1.4884 0.8043onset True 0.0813
18.4715151.82190.0676True False 0.6535 0.1022 False 0.0000
18.4715151.82190.0676True True 0.6535 0.1022onset False 0.0987
18.5702156.27060.0422True False 2.0092 0.6058 False 0.0000
18.5702156.27060.0422True True 2.0092 0.6058onset True 0.1741
18.7443158.54360.0527True False 1.2417 0.4150 False 0.0000
18.7443158.54360.0527True True 1.2417 0.4150onset True 0.1378
18.8821151.82190.0496False True 0.9473 0.0047 False 0.0020
18.8841221.00000.0457True True 1.0064 0.0100 False 0.1098
18.9939208.59620.0078False False 5.7084 0.0000 False 0.0000
18.9939205.60560.0078False True 5.7084 0.0000onset True 0.0929
19.0868208.59620.0637False False 4.8244 0.0000 False 0.0000
19.0868160.84970.0637False True 4.8244 0.0000onset True 0.1191
19.2059133.31880.0469True True 0.3761 0.3766 False 0.1131
19.3190133.31880.0499True False 0.5350 0.5701 False 0.0000
19.3190133.31880.0499True True 0.5350 0.5701onset False 0.1509
19.4699131.40740.0480True False 1.8313 0.3762 False 0.0000
19.4699131.40740.0480True True 1.8313 0.3762onset True 0.1404
19.6104125.83610.0730True True 0.3846 0.0949 False 0.1068
19.7172183.17380.0180True True 0.5501 0.4861 False 0.2828
20.0000194.06590.0011False False 0.4973 0.0000 False 0.0000

The transcription itself transforms each group of breakpoints into a series of notes. By default groups are enclosed within slurs and accents mark particularly salient onsets. A ‘x’ notehead indicates that at the given moment the sound was noisy / unpitched.

[56]:
options = transcribe.TranscriptionOptions(unvoicedPitch='4G', addSlurs=True, debug=False, addGliss=True, addAccents=True)
v = analysis.transcribe(options=options)
v
[maelzel.scoring:notation.py:536:addSpanner:WARNING] A Notation cannot be assigned both start and end of a spanner. Removing the partner spannerself=«3D+ 1.625:1.75 1/8♩ noteheads=['0:cross'] attachments=[Articulation(kind=accent)] spanners=[Slur(kind=start, linetype=solid, nestingLevel=1, uuid=rdmciucp)]», spanner=Slur(kind=end, linetype=solid, nestingLevel=1, uuid=rdmciucp), partner=Slur(kind=start, linetype=solid, nestingLevel=1, uuid=rdmciucp), end=None
[maelzel.scoring:renderlily.py:583:_handleSpannerPost:ERROR] Two many nested slurs: Slur(kind=start, linetype=solid, nestingLevel=5, parent=«3E>gliss 0.9:1 1/10♩ 5/4 noteheads=['0:cross'] attachments=[Articulation(kind=accent)] spanners=[Slur(kind=start, linetype=solid, nestingLevel=5, parent=«3E>gliss 0.9:1 1/10♩ 5/4 noteheads=['0:cross'] attachments=[Articulation(kind=accent)] spanners=[...]», uuid=kda2ogjp)]», uuid=kda2ogjp), skipping
[maelzel.scoring:renderlily.py:583:_handleSpannerPost:ERROR] Two many nested slurs: Slur(kind=start, linetype=solid, nestingLevel=4, parent=«3Eb+25gliss 0.833:0.917 1/12♩ 3/2 attachments=[Articulation(kind=accent)] spanners=[Slur(kind=start, linetype=solid, nestingLevel=4, parent=«3Eb+25gliss 0.833:0.917 1/12♩ 3/2 attachments=[Articulation(kind=accent)] spanners=[...]», uuid=i3nqfv21)]», uuid=i3nqfv21), skipping
[maelzel.scoring:renderlily.py:583:_handleSpannerPost:ERROR] Two many nested slurs: Slur(kind=start, linetype=solid, nestingLevel=3, parent=«3E-gliss 1.917:2 1/12♩ 3/2 spanners=[Slur(kind=start, linetype=solid, nestingLevel=3, parent=«3E-gliss 1.917:2 1/12♩ 3/2 spanners=[...]», uuid=9oqjjj3l)]», uuid=9oqjjj3l), skipping
[56]:
Voice([3Eb<:-58dB:0.313♩:offset=0.395:symbols=[Notehead(shape=cross)], 3F<:-19dB:0.163♩:offset=0.708:symbols=[Notehead(shape=cross), Articulation(kind=accent)], 3E>:-20dB:0.126♩:offset=0.871:gliss=True:symbols=[Notehead(shape=cross), Articulation(kind=accent), Slur(anchor=Slur, kind=start, linetype=solid, partnerSpanner=Slur, uuid=kda2ogjp)], 3F:-20dB:0.06♩:offset=0.997:symbols=[Slur(anchor=Slur, kind=end, linetype=solid, partnerSpanner=Slur, uuid=kda2ogjp)], 3E<:-19dB:0.186♩:offset=1.057, 3E:-20dB:0.179♩:offset=1.242:gliss=True:symbols=[Articulation(kind=accent), Slur(anchor=Slur, kind=start, linetype=solid, partnerSpanner=Slur, uuid=zlc78b5i)], 3F<:-41dB:0.036♩:offset=1.422:gliss=True, 3D+:-53dB:0.046♩:offset=1.458:symbols=[Slur(anchor=Slur, kind=end, linetype=solid, partnerSpanner=Slur, uuid=zlc78b5i)], 4G:-57dB:0.232♩:offset=1.503:symbols=[Notehead(shape=cross), Articulation(kind=accent)], 3D:-32dB:0.108♩:offset=1.736:gliss=True:symbols=[Notehead(shape=cross), Slur(anchor=Slur, kind=start, linetype=solid, partnerSpanner=Slur, uuid=7108uupy)], …], dur=20, offset=0)

2.3. Playback

Below is a simple approach to sonify the transcription by defining different presets for sound which are fully voiced, partially voiced or completely unpitched. With more accurate feature detection regarding unpitched sections and formant prediction it might be possible to produce a much more accurate result

[61]:
defPreset('unpitched', r"""
|icentroid, iq=20|
asig pinker
asig *= kamp
iband = icentroid / iq
aout1 resonr asig, icentroid, iband
""")

defPreset('unvoiced', r"""
|icentroid, iq=20|
asig vco2 1, kfreq
anoise = pinker() * 0.1
iband = icentroid / iq
anoise resonr anoise, icentroid, iband
aout1 = (asig + anoise) * kamp
""");
[69]:
for n in v.items:
    if n.playargs:
        n.playargs._checkArgs()
    if n.getProperty('voiced'):
        n.setPlay(instr='saw', args={'kcutoffratio': 8})
    elif n.getProperty('unvoicedGroup'):
        n.setPlay(instr='unpitched', args={'icentroid': n.getProperty('centroid'), 'iq': 10}, gain=1, fade=0)
    else:
        n.setPlay(instr='unvoiced', args={'icentroid': n.getProperty('centroid'), 'iq': 40}, gain=1)

We load the original sound into a Clip in order to easily play it in sync with the analyisis

[63]:
cl = Clip(s.path)
mnOut size: 22
m_pitchTrack size: 22
mnOut size: 22
m_pitchTrack size: 22
mnOut size: 22
m_pitchTrack size: 22
mnOut size: 22
m_pitchTrack size: 22

Normally one would call play as in the cell below. In order for the generated audio to be playable online we render the result to disk

[75]:
play(
    v.events(position=0, gain=2, sustain=0.05, fade=(0.01, 0.05)),
    cl.events(position=1, delay=0.)
)
[75]:
SynthGroup(synths=85)

Instr: preset:unvoiced - 31 synths

p1startdurp45:kpos6:kgain7:idataidx_8:inumbps9:ibplen10:ichan11:ifadein...
502.0451 𝍪0.4260.31300117231...
502.0452 𝍪0.7390.16300117231...
502.0453 𝍪0.9020.23600117431...
502.0454 𝍪1.7670.17800117431...
...

Instr: preset:saw - 45 synths

p1startdurp45:kpos6:kgain7:idataidx_8:inumbps9:ibplen10:ichan11:ifadein12:ifadeout13:ipchintrp_14:ifadekind15:ktransp16:klag17:kcutoffratio18:kfilterq1920212223...
503.0869 𝍪1.0870.186002192310.010.050100.183051.750.109660.18576...
503.087 𝍪1.2730.311002195310.010.050100.1830520.0999540.17947...
503.0871 𝍪1.8940.186002192310.010.050100.183051.250.0683070.18576...
503.0872 𝍪4.1580.209002192310.010.050100.183052.750.109330.20898...
...

Instr: preset:unpitched - 8 synths

p1startdurp45:kpos6:kgain7:idataidx_8:inumbps9:ibplen10:ichan11:ifadein...
501.0118 𝍪1.5340.28200117331...
501.0119 𝍪2.4980.11200117331...
501.012 𝍪2.7240.13700117331...
501.0121 𝍪2.8110.58700117331...
...

Instr: preset:_clip_diskin - 1 synths

p1startdurp45:kpos6:kgain7:idataidx_8:inumbps9:ibplen10:ichan11:ifadein12:ifadeout13:ipchintrp_14:ifadekind15:ipath16:isndfilechan17:kspeed18:iskip19:iwrap20:iwinsize2122...
504.0016 𝍪0.03120.000011212310.020.0201snd/istambul2.flac-110040...
[24]:
render('assets/transcription.ogg', [
    v.events(position=0, gain=2, sustain=0.05, fade=(0.0, 0.05)),
    cl.events(position=1, delay=0.)
])

[24]:
OfflineRenderer(outfile="assets/transcription.ogg", 2 channels, 20.10 secs, 44100 Hz)

[ ]: