Reverse Conducting Evaluation
Craig Stuart Sapp <craig@ccrma.stanford.edu>
15 Nov 2005

This is a generic Mathematica notebook used for examining the individual reverse conducting performances of Chopin Mazurkas compared to carefully corrected versions of the average beat tappings for the performance.

Mazurka in A Minor, Op. 7, No. 2
Chiu 1999

Loading the raw data

First load the absolute beat positions for the average beat positions and the corrected beat positions:

[Graphics:Images/index_gr_1.gif]
[Graphics:Images/index_gr_2.gif]
[Graphics:Images/index_gr_3.gif]

Now measure the time difference between each average and corrected beat.  A negative value means that the average beat occurs before the corrected beat time.

[Graphics:Images/index_gr_4.gif]
[Graphics:Images/index_gr_5.gif]

The average correction to each beat from the average tap times of all trials is typically around 48 milliseconds.  This means that, on the average, each average tap beat position must be moved 48 milliseconds to align with the correct beat time.

[Graphics:Images/index_gr_6.gif]
[Graphics:Images/index_gr_7.gif]
[Graphics:Images/index_gr_8.gif]

Manual correction of average reverse conducting beat positions

This section displays the difference between the average reverse conducted beat positions and the values derived from manually correcting these values by ear using an soundfile editor.

This corrected beat duration data was measured by listening to the audio file and manually locating the beat positions in the soundfile.  The initial positions of the average reverse conducting performance were used as a baseline, and this baseline was then adjusted so that it sounded as if all beats were occuring with the attacks of notes in the performance.  There are probably a few errors in the corrected due to the tediousness of getting this data, and there are a few locations in the audio where the beat positions become vague, but overall, this data represents a highly accurate description of the pianists' actual beat-by-beat performance tempo, probably an accuracy within 10 millseconds for all beats.

[Graphics:Images/index_gr_9.gif]
[Graphics:Images/index_gr_10.gif]
[Graphics:Images/index_gr_11.gif]

Here are the timing differences between the average reverse-conducting absolute times and the absolute times measured in a sound editor:

[Graphics:Images/index_gr_12.gif]
[Graphics:Images/index_gr_13.gif]

[Graphics:Images/index_gr_14.gif]

The average beats are not quite centered on the corrected beats.  The average beats are, on the average, 4.6 milliseconds after the corrected beats.

[Graphics:Images/index_gr_15.gif]
[Graphics:Images/index_gr_16.gif]

The histogram below shows the data from the plot above.  It generally shows a nicely distributed range of corrections.  The peak at 0 is due to the correction values being too small to change to the theoretically correct values about up to 10 milliseconds before their measured positions.

<< Graphics`Graphics`
[Graphics:Images/index_gr_17.gif]

[Graphics:Images/index_gr_18.gif]

Trial Quality Measurements

Define a logarithmic score for the amount of correction needed for a given beat.  A score of 1 means 20 ms correction, 2 means 40 ms correction, 3 means 80 ms correction, 4, means 160 ms correction, 5 means 360 ms correction, etc. A score of 0 means there was less than 1 millisecond correction necessary, and a score of 0.5 means 10 milliseconds were needed to correct the beat location.

Qualitatively, a score of 0-1 is an "A", 1-2 is a "B", 2-3 is a "C", 3-4 is a "D", 4-5 is an "E", and 5-6 is an "F".  A score of 6 means the correction was 640 milliseconds, or about
2/3 of a second.

[Graphics:Images/index_gr_19.gif]

Mostly the corrections for the average of the tapping trials are evenly distributed between A, B, C, and D quality, although "A" corrections were the most common.  The worst case correction was between a value of 4 and 5.  This occurs on the first beat when there has been no preparation of a previous event, so is most likely due to the reaction time of the reverse conductor.  Other corrections in the range between 4 and 5 are also likely due to an unprepared change in tempo which was not expected beforehand.

<< Graphics`Graphics`
[Graphics:Images/index_gr_20.gif]

[Graphics:Images/index_gr_21.gif]

The range from 0-1 can be characterized as "no audible difference"(0-20 ms); 1-2 as "slight audible difference (20-40 ms); 2-3 "minor audible difference (40-80 ms); 3-4 as "noticeable audible difference" (80-160 ms), 4-5 as "very noticeable audible difference" (160-320 ms), and anythhing higher as "extremely noticeable audible difference" (> 320 ms).

[Graphics:Images/index_gr_22.gif]
[Graphics:Images/index_gr_23.gif]

Here is a histogram display of the necessary corrections with positive and negative adjustments separated.  Negative values mean that the correct value occurs before the average reverse conduting time.  Notice that most of the larger corrections occur in the negative range

[Graphics:Images/index_gr_24.gif]

[Graphics:Images/index_gr_25.gif]

Examine the learning curve

How does the quality of the reverse conducting improve over time? Does the conductor learn to follow the performance better after repeated listening?  First, load the individual tapping trials which contain precalculated offset values into the audio file of the performance recording:

[Graphics:Images/index_gr_26.gif]

Now calculate the timing differences between the individual trials and the corrected beat times.

[Graphics:Images/index_gr_27.gif]
[Graphics:Images/index_gr_28.gif]
[Graphics:Images/index_gr_29.gif]

Here are quality score plots for each trial:

[Graphics:Images/index_gr_30.gif]

[Graphics:Images/index_gr_31.gif]

[Graphics:Images/index_gr_32.gif]

[Graphics:Images/index_gr_33.gif]

[Graphics:Images/index_gr_34.gif]

[Graphics:Images/index_gr_35.gif]

[Graphics:Images/index_gr_36.gif]

[Graphics:Images/index_gr_37.gif]

[Graphics:Images/index_gr_38.gif]

[Graphics:Images/index_gr_39.gif]

[Graphics:Images/index_gr_40.gif]

[Graphics:Images/index_gr_41.gif]

[Graphics:Images/index_gr_42.gif]

[Graphics:Images/index_gr_43.gif]

[Graphics:Images/index_gr_44.gif]

[Graphics:Images/index_gr_45.gif]

[Graphics:Images/index_gr_46.gif]

[Graphics:Images/index_gr_47.gif]

[Graphics:Images/index_gr_48.gif]

[Graphics:Images/index_gr_49.gif]

[Graphics:Images/index_gr_50.gif]

How does the accuracy of the first half of the trials compare to the second half of the trials?:

[Graphics:Images/index_gr_51.gif]
[Graphics:Images/index_gr_52.gif]

[Graphics:Images/index_gr_53.gif]

[Graphics:Images/index_gr_54.gif]

[Graphics:Images/index_gr_55.gif]

[Graphics:Images/index_gr_56.gif]
[Graphics:Images/index_gr_57.gif]
[Graphics:Images/index_gr_58.gif]
[Graphics:Images/index_gr_59.gif]

On the average, beats in the second half of the trials are 6 milliseconds closer to the "true" beat locations:

[Graphics:Images/index_gr_60.gif]
[Graphics:Images/index_gr_61.gif]
[Graphics:Images/index_gr_62.gif]
[Graphics:Images/index_gr_63.gif]
[Graphics:Images/index_gr_64.gif]
[Graphics:Images/index_gr_65.gif]
[Graphics:Images/index_gr_66.gif]
[Graphics:Images/index_gr_67.gif]
[Graphics:Images/index_gr_68.gif]
[Graphics:Images/index_gr_69.gif]

Plotting the average displacement error for each trial

This section plots the average displacement error for each trial which shows the gradual improvement in the reverse conducting over time.

[Graphics:Images/index_gr_70.gif]
[Graphics:Images/index_gr_71.gif]
[Graphics:Images/index_gr_72.gif]

[Graphics:Images/index_gr_73.gif]

Now fit an exponentially decreasing line though the trial errors to model what the "learning curve" is for reverse conducting this performance.

[Graphics:Images/index_gr_74.gif]
[Graphics:Images/index_gr_75.gif]
[Graphics:Images/index_gr_76.gif]

[Graphics:Images/index_gr_77.gif]

[Graphics:Images/index_gr_78.gif]

[Graphics:Images/index_gr_79.gif]

Predictions of accuracy can be derived from the learning curve over time:

[Graphics:Images/index_gr_80.gif]
[Graphics:Images/index_gr_81.gif]
[Graphics:Images/index_gr_82.gif]
[Graphics:Images/index_gr_83.gif]
[Graphics:Images/index_gr_84.gif]
[Graphics:Images/index_gr_85.gif]
[Graphics:Images/index_gr_86.gif]
[Graphics:Images/index_gr_87.gif]

Effect of removing the "worst" trial

What happens if the "worst" trial  is removed from the average tapping time for each beat?

[Graphics:Images/index_gr_88.gif]
[Graphics:Images/index_gr_89.gif]
[Graphics:Images/index_gr_90.gif]
[Graphics:Images/index_gr_91.gif]
[Graphics:Images/index_gr_92.gif]
[Graphics:Images/index_gr_93.gif]

[Graphics:Images/index_gr_94.gif]

[Graphics:Images/index_gr_95.gif]
[Graphics:Images/index_gr_96.gif]
[Graphics:Images/index_gr_97.gif]

[Graphics:Images/index_gr_98.gif]

[Graphics:Images/index_gr_99.gif]

[Graphics:Images/index_gr_100.gif]

[Graphics:Images/index_gr_101.gif]
[Graphics:Images/index_gr_102.gif]
[Graphics:Images/index_gr_103.gif]
[Graphics:Images/index_gr_104.gif]

Trying to identifying "worst" performances from the average

Instead of comparing the individual performances to the corrected data, compare them to the average of all trials.

[Graphics:Images/index_gr_105.gif]
[Graphics:Images/index_gr_106.gif]
[Graphics:Images/index_gr_107.gif]
[Graphics:Images/index_gr_108.gif]

[Graphics:Images/index_gr_109.gif]

Dropping any single trial

Dropping the worst trial does not improve the accuracy of the average. Does dropping any other trial improve the average of all trials?

[Graphics:Images/index_gr_110.gif]
[Graphics:Images/index_gr_111.gif]
[Graphics:Images/index_gr_112.gif]
[Graphics:Images/index_gr_113.gif]
[Graphics:Images/index_gr_114.gif]

[Graphics:Images/index_gr_115.gif]

Dropping a range of trials

Now examine how the displacement error changes as more and more of the earlier trials are removed from the average.

[Graphics:Images/index_gr_116.gif]
[Graphics:Images/index_gr_117.gif]
[Graphics:Images/index_gr_118.gif]
[Graphics:Images/index_gr_119.gif]
[Graphics:Images/index_gr_120.gif]
[Graphics:Images/index_gr_121.gif]

[Graphics:Images/index_gr_122.gif]

Here is a plot created by dropping the later trials more and more:

[Graphics:Images/index_gr_123.gif]
[Graphics:Images/index_gr_124.gif]
[Graphics:Images/index_gr_125.gif]
[Graphics:Images/index_gr_126.gif]
[Graphics:Images/index_gr_127.gif]
[Graphics:Images/index_gr_128.gif]

[Graphics:Images/index_gr_129.gif]

[Graphics:Images/index_gr_130.gif]

[Graphics:Images/index_gr_131.gif]

[Graphics:Images/index_gr_132.gif]
[Graphics:Images/index_gr_133.gif]
[Graphics:Images/index_gr_134.gif]
[Graphics:Images/index_gr_135.gif]
[Graphics:Images/index_gr_136.gif]

[Graphics:Images/index_gr_137.gif]

[Graphics:Images/index_gr_138.gif]

[Graphics:Images/index_gr_139.gif]

The red curve in the plot above shows the average displacement errors from the corrected times for each of the 20 individual trials.  The blue curve represents dropping more and more of the later trials being dropped.  The black curve represents dropping more and more of the earlier trials.

Offset sensitivity

Individual trials were aligned to the  test case using a few points which seemed to be in a stable tempo region. How accurate is the offset calculated from these few points in the audio?  Would the average of all trials improve if an offset is calculated for each trial based on the corrected data?

[Graphics:Images/index_gr_140.gif]
[Graphics:Images/index_gr_141.gif]
[Graphics:Images/index_gr_142.gif]
[Graphics:Images/index_gr_143.gif]
[Graphics:Images/index_gr_144.gif]
[Graphics:Images/index_gr_145.gif]
[Graphics:Images/index_gr_146.gif]
[Graphics:Images/index_gr_147.gif]
[Graphics:Images/index_gr_148.gif]
[Graphics:Images/index_gr_149.gif]
[Graphics:Images/index_gr_150.gif]
[Graphics:Images/index_gr_151.gif]
[Graphics:Images/index_gr_152.gif]
[Graphics:Images/index_gr_153.gif]
[Graphics:Images/index_gr_154.gif]

Comparing corrected beats to average reverse conducting beats

This section plots the average tempo range and compares it to the acutal tempos measured from the audio file.

[Graphics:Images/index_gr_155.gif]
[Graphics:Images/index_gr_156.gif]
[Graphics:Images/index_gr_157.gif]
[Graphics:Images/index_gr_158.gif]
[Graphics:Images/index_gr_159.gif]
[Graphics:Images/index_gr_160.gif]

[Graphics:Images/index_gr_161.gif]

[Graphics:Images/index_gr_162.gif]

[Graphics:Images/index_gr_163.gif]

[Graphics:Images/index_gr_164.gif]
[Graphics:Images/index_gr_165.gif]
[Graphics:Images/index_gr_166.gif]

[Graphics:Images/index_gr_167.gif]

[Graphics:Images/index_gr_168.gif]

[Graphics:Images/index_gr_169.gif]

[Graphics:Images/index_gr_170.gif]

[Graphics:Images/index_gr_171.gif]

[Graphics:Images/index_gr_172.gif]

Here is a zoom in on the tempo for every 8 measures.  The black line indicates the mean reverse-conducted durations for every beat.  The dark gray lines surrounding the average duration line is the 95% confidence range for the true mean, and the light gray lines indicate the maximum and minumum durations for each beat from all reverse conducting trials.

The red and blue dots indicates the beat event durations from the audio file which are more accurate than the average reverse conducting durations, and can be assumbed to be the "correct" answer.  The red dots represent the first beat of a measure, and the blue dots represent the other beats in the measure.

[Graphics:Images/index_gr_173.gif]

[Graphics:Images/index_gr_174.gif]

[Graphics:Images/index_gr_175.gif]

[Graphics:Images/index_gr_176.gif]

[Graphics:Images/index_gr_177.gif]

[Graphics:Images/index_gr_178.gif]

[Graphics:Images/index_gr_179.gif]

[Graphics:Images/index_gr_180.gif]

[Graphics:Images/index_gr_181.gif]

[Graphics:Images/index_gr_182.gif]

[Graphics:Images/index_gr_183.gif]

[Graphics:Images/index_gr_184.gif]

[Graphics:Images/index_gr_185.gif]

[Graphics:Images/index_gr_186.gif]

[Graphics:Images/index_gr_187.gif]

[Graphics:Images/index_gr_188.gif]


Converted by Mathematica      November 14, 2005