Automatic time-syncing feature

classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Automatic time-syncing feature

Raphaël Marinier
Hi Audacity developers,

One feature that I have been missing in Audacity is automatic time-syncing of two audio tracks.

The use case is when one has multiple recordings of the same event, not time-synced, and wants to align them together. This happens for instance when the two tracks come from multiple devices (e.g. video camera and portable audio recorder).
Right now, the user has to manually time-shift tracks to make sure they align, which is cumbersome and imprecise.

I've researched a bit the subject, and I think it would be doable to implement auto-syncing of tracks in an efficient way using a combination of audio fingerprinting (see for instance https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf) for approximate syncing, and maximization of cross-correlation for the fine-tuning.

I could implement such feature in audacity as a new effect. Would this contribution be welcome in Audacity? Is it possible that the output of an effect be a "time-shift"?

Thanks,

Raphaël


------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel
rbd
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Automatic time-syncing feature

rbd

This is an interesting problem.  Offhand, I would guess results would be better if you emphasized low frequencies in some way: high frequencies/short wavelengths are more easily reflected so if recording devices or objects around them move, reflection paths could change, shifting the timing by milliseconds (about 1ms per 1ft change in path length).  Of course, low frequencies will have less timing precision, so there's a tradeoff.

Another important consideration is the difference in sample rates between recordings. Even if 2 devices claim to record at 44.1kHz, the *actual* sample rate is slightly different. A 0.01% difference, which is very likely in consumer devices, over 20 minutes (1200s) of recording time would result in a drift of 0.12s (!), so any time-syncing should estimate time shift at multiple points and try to correct for sample rate differences.

-Roger

On 6/26/16 10:27 AM, Raphaël Marinier wrote:
Hi Audacity developers,

One feature that I have been missing in Audacity is automatic time-syncing of two audio tracks.

The use case is when one has multiple recordings of the same event, not time-synced, and wants to align them together. This happens for instance when the two tracks come from multiple devices (e.g. video camera and portable audio recorder).
Right now, the user has to manually time-shift tracks to make sure they align, which is cumbersome and imprecise.

I've researched a bit the subject, and I think it would be doable to implement auto-syncing of tracks in an efficient way using a combination of audio fingerprinting (see for instance https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf) for approximate syncing, and maximization of cross-correlation for the fine-tuning.

I could implement such feature in audacity as a new effect. Would this contribution be welcome in Audacity? Is it possible that the output of an effect be a "time-shift"?

Thanks,

Raphaël



------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape


_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel


------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Automatic time-syncing feature

James Crook
In reply to this post by Raphaël Marinier
Raphaël, we are interested, but in something more general than this.
http://wiki.audacityteam.org/wiki/Proposal_Audio_Diff

Aligning two tracks is a special case.
You should look at VAMP plug ins for Audacity.  These can do analysis, not just effects. 
Here is some information about the MATCH plug in for audio diff.

https://code.soundsoftware.ac.uk/projects/match-vamp
Calculate alignment between two performances in separate channel inputs.

The code for doing the alignment is one part of the problem.  We also need to design good user interface for using it.  My view is that when designing an interface to align two audio sequences without inserting gaps, we should at the same time be thinking about the interface for aligning them with gaps.  Otherwise we will eventually end up with two different interfaces doing 'the same thing'.

I would very much like it if you worked with the VAMP MATCH plug in, and get details sorted and it written up for the manual so that we would want to ship it with Audacity.

--James.



On 6/26/2016 3:27 PM, Raphaël Marinier wrote:
Hi Audacity developers,

One feature that I have been missing in Audacity is automatic time-syncing of two audio tracks.

The use case is when one has multiple recordings of the same event, not time-synced, and wants to align them together. This happens for instance when the two tracks come from multiple devices (e.g. video camera and portable audio recorder).
Right now, the user has to manually time-shift tracks to make sure they align, which is cumbersome and imprecise.

I've researched a bit the subject, and I think it would be doable to implement auto-syncing of tracks in an efficient way using a combination of audio fingerprinting (see for instance https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf) for approximate syncing, and maximization of cross-correlation for the fine-tuning.

I could implement such feature in audacity as a new effect. Would this contribution be welcome in Audacity? Is it possible that the output of an effect be a "time-shift"?

Thanks,

Raphaël



------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Automatic time-syncing feature

Bill Unruh
In reply to this post by Raphaël Marinier
On Sun, 26 Jun 2016, Raphaël Marinier wrote:

> Hi Audacity developers,
> One feature that I have been missing in Audacity is automatic time-syncing of two audio tracks.
>
> The use case is when one has multiple recordings of the same event, not time-synced, and wants to
> align them together. This happens for instance when the two tracks come from multiple devices
> (e.g. video camera and portable audio recorder).
> Right now, the user has to manually time-shift tracks to make sure they align, which is cumbersome
> and imprecise.

Well, time shift is not the only problem, since most recordings are not at the
same frequency even if they have the same nominal frequency. 44100 and 48000
are obvious, but 44100 and 44150 are far more possible with standard consumer
grade sound cards. Of course one could break up the item into blocks, and time
shift each one. (for example, it would take about 800 sec for the above two
frequencies to be out by 1 sec in their time sync, so timeshifting once a
second could be done. But even then a dropping or adding of 50 frames would
surely be noticeable.) Ie, one should also do frequency shifting as well if it
were to work.  Once could of course do time shift at the beginning and the end
of a block and use the difference to also impliment a freq shift.



>
> I've researched a bit the subject, and I think it would be doable to implement auto-syncing of
> tracks in an efficient way using a combination of audio fingerprinting (see for instance
> https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf) for approximate syncing, and
> maximization of cross-correlation for the fine-tuning.
>
> I could implement such feature in audacity as a new effect. Would this contribution be welcome in
> Audacity? Is it possible that the output of an effect be a "time-shift"?
>
> Thanks,
>
> Raphaël
>
>
>
------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Automatic time-syncing feature

Raphaël Marinier
Varying sampling rate if indeed an issue that will need to be taken care of. Do the actual frequencies of multiple devices tend to only differ by a constant multiplier (e.g. constant 44100 vs constant 44150), or is it common to have small variations of sampling rate in a recording from a single device (e.g. device first records at 44100, and then drifts to 44150)? The former is of course easier to solve.

James, thanks for the background and advice.
Indeed the "Audio Diff" proposal is more general. It also seems quite harder, at least because of the variations in the way of playing, speed and potential gaps, as you mentioned, and because of all the UI questions around the handling of imprecise and partial matches, time expansion, errors, etc.. Also, the algorithms will of course be more generic and complex than for aligning two recordings of the same performance. I had a quick look at the MATCH paper, and the max errors for commercial recordings on page 5 shows that the algorithm is far from perfect.

I'll have a look into the MATCH plugin and do some tests. Do you think there would be space for both features: (1) Simple alignment of N recordings of the same sound (my original proposal) (2) Audio Diff, with advanced UI to visualize and work with diffs? Is there any other software doing (2), so that we can have an idea of the user experience?

Raphaël


On Sun, Jun 26, 2016 at 6:00 PM, Bill Unruh <[hidden email]> wrote:
On Sun, 26 Jun 2016, Raphaël Marinier wrote:

Hi Audacity developers,
One feature that I have been missing in Audacity is automatic time-syncing of two audio tracks.

The use case is when one has multiple recordings of the same event, not time-synced, and wants to
align them together. This happens for instance when the two tracks come from multiple devices
(e.g. video camera and portable audio recorder).
Right now, the user has to manually time-shift tracks to make sure they align, which is cumbersome
and imprecise.

Well, time shift is not the only problem, since most recordings are not at the
same frequency even if they have the same nominal frequency. 44100 and 48000
are obvious, but 44100 and 44150 are far more possible with standard consumer
grade sound cards. Of course one could break up the item into blocks, and time
shift each one. (for example, it would take about 800 sec for the above two
frequencies to be out by 1 sec in their time sync, so timeshifting once a
second could be done. But even then a dropping or adding of 50 frames would
surely be noticeable.) Ie, one should also do frequency shifting as well if it
were to work.  Once could of course do time shift at the beginning and the end
of a block and use the difference to also impliment a freq shift.





I've researched a bit the subject, and I think it would be doable to implement auto-syncing of
tracks in an efficient way using a combination of audio fingerprinting (see for instance
https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf) for approximate syncing, and
maximization of cross-correlation for the fine-tuning.

I could implement such feature in audacity as a new effect. Would this contribution be welcome in
Audacity? Is it possible that the output of an effect be a "time-shift"?

Thanks,

Raphaël



------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel



------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Automatic time-syncing feature

James Crook
On 6/26/2016 9:11 PM, Raphaël Marinier wrote:
> Do you think there would be space for both features: (1) Simple
> alignment of N recordings of the same sound (my original proposal) (2)
> Audio Diff, with advanced UI to visualize and work with diffs?
Yes.
All I am suggesting is that in designing the UI for the special case the
more general case be thought about.

For example in the general case we might have indications of how
stretchy different parts of the audio are.  Silence and vowel sounds are
stretchy.  Percussion sounds are not.  The visuals and interaction for
indicating that could be used for the 'stretchy' pieces at the beginning
and ends of the audio in the simpler case of just time shifting whole
sequence without otherwise changing it.

Something else to think about is what happens if you attempt to align
two mono tracks that happen actually to be left and right audio of a
stereo track.  Under the hood you really need source separation, to pick
out the central instruments, and delayed right and left instruments
(respectively).  The alignment of left and right audio channels is
ambiguous if you are not allowed to split the sources out.

> Is there any other software doing (2), so that we can have an idea of
> the user experience?
I'm not aware of it for audio, but have not researched it.

I am aware of it for DNA and protein sequence alignment editors (from
the 90s).  The sync-lock we have in Audacity is a starting point for a
manual alignment editor.  We would need to be able to lock particular
segments of audio together, not just whole sequence, and to be able to
turn on and off those local sync-locks easily. Our time ruler would need
to allow insertions and deletions in it just like any other track.  One
exercise is to think about how 'Truncate Silence' effect would look as
an effect if it did the same thing as now but affected the
timeline/ruler rather than the waveform.

As well as an alignment view we would want a dotplot view.

--James.


------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Automatic time-syncing feature

Federico Miyara
In reply to this post by Raphaël Marinier

Raphael,

If the sample rate is derived from a crystal oscillator (as I think is the vast majority of the A/D converters), in the following link

http://kunz-pc.sce.carleton.ca/thesis/CrystalOscillators.pdf

are listed a number of causes of frequency drift, for instance temperature, warm-up, hysteresis, aging. Temperature variations seem to be the most relevant drift cause for short time drift.

In a very worst case (a very cheap crystal) we may be around the 0.01 % mentioned by Roger Dannenberg (though he may have been talking of nominal frequency offset errors). Assuming that the temperature variation is bounded to about 20 ºC during recording, we would have a variation of at most 0.005 %. This means a drift of about 90 ms per hour (assuming steady increase of temperature with time), so the effect may be actually relevant.

Using not-so-cheap converters which include temperature-compensated crystals, probably we are at least one order of magnitude below, i.e., about 10 ms per hour. It may be still too much for certain applications.

This would indicate that it is worth to do a dynamic syncronization.

Note that as figure 3 of the linked article suggests, due to manufacturing tolerance of the cutting angle of the crystal, two particular recorders may have opposite drifts with temperature. The best one should be use as the master.

Regards,

Federico



On 26/06/2016 17:11, Raphaël Marinier wrote:
Varying sampling rate if indeed an issue that will need to be taken care of. Do the actual frequencies of multiple devices tend to only differ by a constant multiplier (e.g. constant 44100 vs constant 44150), or is it common to have small variations of sampling rate in a recording from a single device (e.g. device first records at 44100, and then drifts to 44150)? The former is of course easier to solve.

James, thanks for the background and advice.
Indeed the "Audio Diff" proposal is more general. It also seems quite harder, at least because of the variations in the way of playing, speed and potential gaps, as you mentioned, and because of all the UI questions around the handling of imprecise and partial matches, time expansion, errors, etc.. Also, the algorithms will of course be more generic and complex than for aligning two recordings of the same performance. I had a quick look at the MATCH paper, and the max errors for commercial recordings on page 5 shows that the algorithm is far from perfect.

I'll have a look into the MATCH plugin and do some tests. Do you think there would be space for both features: (1) Simple alignment of N recordings of the same sound (my original proposal) (2) Audio Diff, with advanced UI to visualize and work with diffs? Is there any other software doing (2), so that we can have an idea of the user experience?

Raphaël


On Sun, Jun 26, 2016 at 6:00 PM, Bill Unruh <[hidden email]> wrote:
On Sun, 26 Jun 2016, Raphaël Marinier wrote:

Hi Audacity developers,
One feature that I have been missing in Audacity is automatic time-syncing of two audio tracks.

The use case is when one has multiple recordings of the same event, not time-synced, and wants to
align them together. This happens for instance when the two tracks come from multiple devices
(e.g. video camera and portable audio recorder).
Right now, the user has to manually time-shift tracks to make sure they align, which is cumbersome
and imprecise.

Well, time shift is not the only problem, since most recordings are not at the
same frequency even if they have the same nominal frequency. 44100 and 48000
are obvious, but 44100 and 44150 are far more possible with standard consumer
grade sound cards. Of course one could break up the item into blocks, and time
shift each one. (for example, it would take about 800 sec for the above two
frequencies to be out by 1 sec in their time sync, so timeshifting once a
second could be done. But even then a dropping or adding of 50 frames would
surely be noticeable.) Ie, one should also do frequency shifting as well if it
were to work.  Once could of course do time shift at the beginning and the end
of a block and use the difference to also impliment a freq shift.





I've researched a bit the subject, and I think it would be doable to implement auto-syncing of
tracks in an efficient way using a combination of audio fingerprinting (see for instance
https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf) for approximate syncing, and
maximization of cross-correlation for the fine-tuning.

I could implement such feature in audacity as a new effect. Would this contribution be welcome in
Audacity? Is it possible that the output of an effect be a "time-shift"?

Thanks,

Raphaël



------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel




------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape


_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel


------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel
rbd
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Automatic time-syncing feature

rbd
In reply to this post by Raphaël Marinier

I don't have extensive experience or measurements, but I believe frequencies tend to differ mainly by a scale factor because they're all crystal controlled, and the error comes from uncalibrated inexpensive crystals. However, inexpensive devices are also not thermally compensated, so if you turn on a cold cheap converter and it gets hot, you should expect a small drift over that time period while it warms up. Once things warm up, they'll still drift according to power supplies, phase of the moon, etc., but I think the variation will be an order of magnitude less than the calibration and warm-up effects. -Roger


On 6/26/16 4:11 PM, Raphaël Marinier wrote:
Varying sampling rate if indeed an issue that will need to be taken care of. Do the actual frequencies of multiple devices tend to only differ by a constant multiplier (e.g. constant 44100 vs constant 44150), or is it common to have small variations of sampling rate in a recording from a single device (e.g. device first records at 44100, and then drifts to 44150)? The former is of course easier to solve.

James, thanks for the background and advice.
Indeed the "Audio Diff" proposal is more general. It also seems quite harder, at least because of the variations in the way of playing, speed and potential gaps, as you mentioned, and because of all the UI questions around the handling of imprecise and partial matches, time expansion, errors, etc.. Also, the algorithms will of course be more generic and complex than for aligning two recordings of the same performance. I had a quick look at the MATCH paper, and the max errors for commercial recordings on page 5 shows that the algorithm is far from perfect.

I'll have a look into the MATCH plugin and do some tests. Do you think there would be space for both features: (1) Simple alignment of N recordings of the same sound (my original proposal) (2) Audio Diff, with advanced UI to visualize and work with diffs? Is there any other software doing (2), so that we can have an idea of the user experience?

Raphaël


On Sun, Jun 26, 2016 at 6:00 PM, Bill Unruh <[hidden email]> wrote:
On Sun, 26 Jun 2016, Raphaël Marinier wrote:

Hi Audacity developers,
One feature that I have been missing in Audacity is automatic time-syncing of two audio tracks.

The use case is when one has multiple recordings of the same event, not time-synced, and wants to
align them together. This happens for instance when the two tracks come from multiple devices
(e.g. video camera and portable audio recorder).
Right now, the user has to manually time-shift tracks to make sure they align, which is cumbersome
and imprecise.

Well, time shift is not the only problem, since most recordings are not at the
same frequency even if they have the same nominal frequency. 44100 and 48000
are obvious, but 44100 and 44150 are far more possible with standard consumer
grade sound cards. Of course one could break up the item into blocks, and time
shift each one. (for example, it would take about 800 sec for the above two
frequencies to be out by 1 sec in their time sync, so timeshifting once a
second could be done. But even then a dropping or adding of 50 frames would
surely be noticeable.) Ie, one should also do frequency shifting as well if it
were to work.  Once could of course do time shift at the beginning and the end
of a block and use the difference to also impliment a freq shift.





I've researched a bit the subject, and I think it would be doable to implement auto-syncing of
tracks in an efficient way using a combination of audio fingerprinting (see for instance
https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf) for approximate syncing, and
maximization of cross-correlation for the fine-tuning.

I could implement such feature in audacity as a new effect. Would this contribution be welcome in
Audacity? Is it possible that the output of an effect be a "time-shift"?

Thanks,

Raphaël



------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel




------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape


_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel


------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel
rbd
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Automatic time-syncing feature

rbd
In reply to this post by James Crook
Excellent point. Also, aligning anything to a stereo track will generate
similar problems. I would suggest that if you're recording with multiple
microphones and devices, you're guaranteed to hit phase and multiple
source problems. In the spirit of the "principle of least surprise" I
would expect an alignment effect to just do a reasonable job given the
sources. E.g. if acoustic sources are spread over 10 meters (~30ms at
the speed of sound), I'd hope individual sources would be aligned within
30ms. If there were a single source, I'd hope for much better.

Another possibility is aligning to multiple tracks representing the same
collection of sound sources recorded from different locations. It's
subtly different from aligning to a single track.

-Roger

On 6/26/16 7:01 PM, James Crook wrote:
> Something else to think about is what happens if you attempt to align
> two mono tracks that happen actually to be left and right audio of a
> stereo track.


------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Automatic time-syncing feature

Robert Hänggi
Hi
Incidentally, I've just stumbled over a real-life example where this
alignment would really be of great use to me.
I'm modelling a CD4 demodulation plug-in.
For the background see:
http://forum.audacityteam.org/viewtopic.php?p=307553#p307553
There are also two test (calibration) recordings in this specific post.

In essence, four tracks are embedded in a single stereo track.
The aim is to reverse-engineer what is in a hardware phono demodulator.
I can demodulate the signal, however, there are some difficulties in
proper aligning it with the base audio:
Base left=LFront + LBack (for normal stereo playback)
FM Left= LFront - LBack
(ditto for right)
Thus, I can't simply align them until they cancel.
What's more, the frequencies do not match exactly because we have RIAA
in combination with a noise reduction expander, a delay caused by the
low/high pass filter etc.

In summary, the alignment had to be very exact but at the same time
insensitive to noise, phase & amplitude deviations, and on and on...
For the moment, I will use cross-correlation and least square fitting
for certain "anchor" points.
I look forward to seeing the aligning feature someday implemented in
Audacity. Good luck.

Cheers
Robert


2016-06-27 2:38 GMT+02:00, Roger Dannenberg <[hidden email]>:

> Excellent point. Also, aligning anything to a stereo track will generate
> similar problems. I would suggest that if you're recording with multiple
> microphones and devices, you're guaranteed to hit phase and multiple
> source problems. In the spirit of the "principle of least surprise" I
> would expect an alignment effect to just do a reasonable job given the
> sources. E.g. if acoustic sources are spread over 10 meters (~30ms at
> the speed of sound), I'd hope individual sources would be aligned within
> 30ms. If there were a single source, I'd hope for much better.
>
> Another possibility is aligning to multiple tracks representing the same
> collection of sound sources recorded from different locations. It's
> subtly different from aligning to a single track.
>
> -Roger
>
> On 6/26/16 7:01 PM, James Crook wrote:
>> Something else to think about is what happens if you attempt to align
>> two mono tracks that happen actually to be left and right audio of a
>> stereo track.
>
>
> ------------------------------------------------------------------------------
> Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
> Francisco, CA to explore cutting-edge tech and listen to tech luminaries
> present their vision of the future. This family event has something for
> everyone, including kids. Get more information and register today.
> http://sdm.link/attshape
> _______________________________________________
> audacity-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/audacity-devel
>

------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Automatic time-syncing feature

Raphaël Marinier
Thanks for the information.

I did some testing of the MATCH vamp plugin, running it via sonic analyzer, which integrates it already.

First of all, the algorithm is pretty expensive, and its runtime seems linear in the max time shift allowed. For aligning two 1h tracks, with a max allowed time shift of 60s, it takes 6 minutes on a recent processor (Intel i5-5200U), and takes about 8GB of RAM. Using is for largeer time shifts such as 10 minutes will be quite expensive...

I also tested the quality of the results, to the extent sonic-analyzer allowed me - it can only report graphical results of the alignment analysis, but does not actually align the tracks.

(1)  2 identical audio tracks of a recorded concert, with a time-shift of about 15s between them.
Alignment seems perfect.

(2) 2 identical audio tracks of a recorded concert, except for a 30s hole filled with pink noise, with a time-shift of about 15s between them.
There are 1-2 second zones at the boundaries of the hole where the audio is wrongly aligned. This will be quite problematic when building a feature that allows mix and matching different versions of each passage.

(3) 2 audio tracks recorded from the same concert (left right channels from same device), except for a 30s hole filled with pink noise, with a time-shift of about 15s between them.
Sames issues as (2), no new issues.

(4) 2 audio tracks of the same concert, recorded with 2 different devices.
Throughout the match, it finds ratios of tempos that are as divergent as <0.8 or >1.2 a significant fraction of the time. This is pretty bad since a correct match should find a tempo ratio of 1 throughout the recording. Things can be improved using non-default parameters of lowering the cost of the diagonal to 1.5, and enabling the "path smoothing" feature, but tempo ratio still routinely hovers around 0.9 - 1.1.

(5) 2 recordings of two performances of the same composition, time shift of about 15s, and hole of about 30s.
Default parameters lead to big issues at boundaries around the hole (10s and 30s of incorrect matches). 
However, using non-default cost for diagonal again significantly improves the match by mostly fixing the boundaries around the hole. There is still a small issue with the first 0.5s of the performance that remains incorrectly matched.
I cannot really evaluate the match more than that, because sonic-analyzer just produces the graphs, but does not actually match the tracks.

My conclusion is that the match plugin cannot be used that easily, even for the simple case of 2 recordings of the same event, because of accuracy and performance. The former could be fixable by imposing stronger regularity of the path (e.g. piecewise linear). The latter might be harder.

I propose to start working on an algorithm and feature specific to the case of 2 recordings of the same event, which is an easier case to start with both in terms of algorithm and UI.
I also agree that we won't be able to align perfectly, in particular because of stereo. All we can do is best-effort given the sources. I will allow for piecewise linear ratios between frequencies (with additional regularity restrictions), to account for varying clock drifts.

Cheers,

--
Raphaël





On Mon, Jun 27, 2016 at 9:19 AM, Robert Hänggi <[hidden email]> wrote:
Hi
Incidentally, I've just stumbled over a real-life example where this
alignment would really be of great use to me.
I'm modelling a CD4 demodulation plug-in.
For the background see:
http://forum.audacityteam.org/viewtopic.php?p=307553#p307553
There are also two test (calibration) recordings in this specific post.

In essence, four tracks are embedded in a single stereo track.
The aim is to reverse-engineer what is in a hardware phono demodulator.
I can demodulate the signal, however, there are some difficulties in
proper aligning it with the base audio:
Base left=LFront + LBack (for normal stereo playback)
FM Left= LFront - LBack
(ditto for right)
Thus, I can't simply align them until they cancel.
What's more, the frequencies do not match exactly because we have RIAA
in combination with a noise reduction expander, a delay caused by the
low/high pass filter etc.

In summary, the alignment had to be very exact but at the same time
insensitive to noise, phase & amplitude deviations, and on and on...
For the moment, I will use cross-correlation and least square fitting
for certain "anchor" points.
I look forward to seeing the aligning feature someday implemented in
Audacity. Good luck.

Cheers
Robert


2016-06-27 2:38 GMT+02:00, Roger Dannenberg <[hidden email]>:
> Excellent point. Also, aligning anything to a stereo track will generate
> similar problems. I would suggest that if you're recording with multiple
> microphones and devices, you're guaranteed to hit phase and multiple
> source problems. In the spirit of the "principle of least surprise" I
> would expect an alignment effect to just do a reasonable job given the
> sources. E.g. if acoustic sources are spread over 10 meters (~30ms at
> the speed of sound), I'd hope individual sources would be aligned within
> 30ms. If there were a single source, I'd hope for much better.
>
> Another possibility is aligning to multiple tracks representing the same
> collection of sound sources recorded from different locations. It's
> subtly different from aligning to a single track.
>
> -Roger
>
> On 6/26/16 7:01 PM, James Crook wrote:
>> Something else to think about is what happens if you attempt to align
>> two mono tracks that happen actually to be left and right audio of a
>> stereo track.
>
>
> ------------------------------------------------------------------------------
> Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
> Francisco, CA to explore cutting-edge tech and listen to tech luminaries
> present their vision of the future. This family event has something for
> everyone, including kids. Get more information and register today.
> http://sdm.link/attshape
> _______________________________________________
> audacity-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/audacity-devel
>

------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel


------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Automatic time-syncing feature

James Crook
Sorry for the delay in getting back to you on this thread.


If you do use a dynamic programming approach, there is a neat trick I invented (in context of DNA sequence matching) that caters for different kinds of matching.  The trick is to run two 'match matrices' at the same time, and have a penalty for switching between them.  This is excellent where there is a mix of signal and noise, as in your test examples.  For aligning noise you want a fairly sloppy not very precisely discriminating comparison that is picking up broad characteristics.  What's great about running two match matrices is that the algorithm naturally switches in to using the best kind of matching for different sections. 


On storage requirements, these can be reduced dramatically relative to MATCH, even allowing large time shifts, by a divide and conquer approach.  Instead of allocating space length x max-shift you sample evenly and only allocate space of k x max-shift for some small value of k such as 100.  The cost is that you have to repeat the analysis log( length-of-sequence) times, where log is to the base k.  So aligning to the nearest 10ms on two 1hr sequences with a shift of up to 20 mins would take 50Mb storage (if one match matrix) or 100Mb (with two in parallel), and the analysis would be repeated 3 times.  Because you stay in cache in the analysis and write much less to external memory it's a big net win both in storage and speed over a single pass approach.

I haven't written versions for sound.  This is extrapolating from back in old times, in the late 80's when I was analysing DNA and protein sequences on a PC with a fraction of the power and storage of modern PCs.  You had to be inventive to get any decent performance at all.  This kind of trick can pay off in a big way, even today.

I can spell out in more detail if you might go down the dynamic programming route, as I realise I have been a bit abbreviated in my description here!

--James.




On 7/7/2016 11:00 PM, Raphaël Marinier wrote:
Thanks for the information.

I did some testing of the MATCH vamp plugin, running it via sonic analyzer, which integrates it already.

First of all, the algorithm is pretty expensive, and its runtime seems linear in the max time shift allowed. For aligning two 1h tracks, with a max allowed time shift of 60s, it takes 6 minutes on a recent processor (Intel i5-5200U), and takes about 8GB of RAM. Using is for largeer time shifts such as 10 minutes will be quite expensive...

I also tested the quality of the results, to the extent sonic-analyzer allowed me - it can only report graphical results of the alignment analysis, but does not actually align the tracks.

(1)  2 identical audio tracks of a recorded concert, with a time-shift of about 15s between them.
Alignment seems perfect.

(2) 2 identical audio tracks of a recorded concert, except for a 30s hole filled with pink noise, with a time-shift of about 15s between them.
There are 1-2 second zones at the boundaries of the hole where the audio is wrongly aligned. This will be quite problematic when building a feature that allows mix and matching different versions of each passage.

(3) 2 audio tracks recorded from the same concert (left right channels from same device), except for a 30s hole filled with pink noise, with a time-shift of about 15s between them.
Sames issues as (2), no new issues.

(4) 2 audio tracks of the same concert, recorded with 2 different devices.
Throughout the match, it finds ratios of tempos that are as divergent as <0.8 or >1.2 a significant fraction of the time. This is pretty bad since a correct match should find a tempo ratio of 1 throughout the recording. Things can be improved using non-default parameters of lowering the cost of the diagonal to 1.5, and enabling the "path smoothing" feature, but tempo ratio still routinely hovers around 0.9 - 1.1.

(5) 2 recordings of two performances of the same composition, time shift of about 15s, and hole of about 30s.
Default parameters lead to big issues at boundaries around the hole (10s and 30s of incorrect matches). 
However, using non-default cost for diagonal again significantly improves the match by mostly fixing the boundaries around the hole. There is still a small issue with the first 0.5s of the performance that remains incorrectly matched.
I cannot really evaluate the match more than that, because sonic-analyzer just produces the graphs, but does not actually match the tracks.

My conclusion is that the match plugin cannot be used that easily, even for the simple case of 2 recordings of the same event, because of accuracy and performance. The former could be fixable by imposing stronger regularity of the path (e.g. piecewise linear). The latter might be harder.

I propose to start working on an algorithm and feature specific to the case of 2 recordings of the same event, which is an easier case to start with both in terms of algorithm and UI.
I also agree that we won't be able to align perfectly, in particular because of stereo. All we can do is best-effort given the sources. I will allow for piecewise linear ratios between frequencies (with additional regularity restrictions), to account for varying clock drifts.

Cheers,

--
Raphaël





On Mon, Jun 27, 2016 at 9:19 AM, Robert Hänggi <[hidden email]> wrote:
Hi
Incidentally, I've just stumbled over a real-life example where this
alignment would really be of great use to me.
I'm modelling a CD4 demodulation plug-in.
For the background see:
http://forum.audacityteam.org/viewtopic.php?p=307553#p307553
There are also two test (calibration) recordings in this specific post.

In essence, four tracks are embedded in a single stereo track.
The aim is to reverse-engineer what is in a hardware phono demodulator.
I can demodulate the signal, however, there are some difficulties in
proper aligning it with the base audio:
Base left=LFront + LBack (for normal stereo playback)
FM Left= LFront - LBack
(ditto for right)
Thus, I can't simply align them until they cancel.
What's more, the frequencies do not match exactly because we have RIAA
in combination with a noise reduction expander, a delay caused by the
low/high pass filter etc.

In summary, the alignment had to be very exact but at the same time
insensitive to noise, phase & amplitude deviations, and on and on...
For the moment, I will use cross-correlation and least square fitting
for certain "anchor" points.
I look forward to seeing the aligning feature someday implemented in
Audacity. Good luck.

Cheers
Robert


2016-06-27 2:38 GMT+02:00, Roger Dannenberg <[hidden email]>:
> Excellent point. Also, aligning anything to a stereo track will generate
> similar problems. I would suggest that if you're recording with multiple
> microphones and devices, you're guaranteed to hit phase and multiple
> source problems. In the spirit of the "principle of least surprise" I
> would expect an alignment effect to just do a reasonable job given the
> sources. E.g. if acoustic sources are spread over 10 meters (~30ms at
> the speed of sound), I'd hope individual sources would be aligned within
> 30ms. If there were a single source, I'd hope for much better.
>
> Another possibility is aligning to multiple tracks representing the same
> collection of sound sources recorded from different locations. It's
> subtly different from aligning to a single track.
>
> -Roger
>
> On 6/26/16 7:01 PM, James Crook wrote:
>> Something else to think about is what happens if you attempt to align
>> two mono tracks that happen actually to be left and right audio of a
>> stereo track.
>
>


------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Automatic time-syncing feature

Vaughan Johnson-4
James: "This is extrapolating from back in old times, in the late 80's when I was analysing DNA and protein sequences..."


Didn't know that!  I was doing similar work then, with Blackboard systems, on the PROTEAN project at Stanford KSL, http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19870014670.pdf  . 

Yes I've known about dynamic programming since about then. Good work, James -- I like your trick. 

-- V

On Wed, Jul 13, 2016 at 3:02 PM, James Crook <[hidden email]> wrote:
Sorry for the delay in getting back to you on this thread.


If you do use a dynamic programming approach, there is a neat trick I invented (in context of DNA sequence matching) that caters for different kinds of matching.  The trick is to run two 'match matrices' at the same time, and have a penalty for switching between them.  This is excellent where there is a mix of signal and noise, as in your test examples.  For aligning noise you want a fairly sloppy not very precisely discriminating comparison that is picking up broad characteristics.  What's great about running two match matrices is that the algorithm naturally switches in to using the best kind of matching for different sections. 


On storage requirements, these can be reduced dramatically relative to MATCH, even allowing large time shifts, by a divide and conquer approach.  Instead of allocating space length x max-shift you sample evenly and only allocate space of k x max-shift for some small value of k such as 100.  The cost is that you have to repeat the analysis log( length-of-sequence) times, where log is to the base k.  So aligning to the nearest 10ms on two 1hr sequences with a shift of up to 20 mins would take 50Mb storage (if one match matrix) or 100Mb (with two in parallel), and the analysis would be repeated 3 times.  Because you stay in cache in the analysis and write much less to external memory it's a big net win both in storage and speed over a single pass approach.

I haven't written versions for sound.  This is extrapolating from back in old times, in the late 80's when I was analysing DNA and protein sequences on a PC with a fraction of the power and storage of modern PCs.  You had to be inventive to get any decent performance at all.  This kind of trick can pay off in a big way, even today.

I can spell out in more detail if you might go down the dynamic programming route, as I realise I have been a bit abbreviated in my description here!

--James.





On 7/7/2016 11:00 PM, Raphaël Marinier wrote:
Thanks for the information.

I did some testing of the MATCH vamp plugin, running it via sonic analyzer, which integrates it already.

First of all, the algorithm is pretty expensive, and its runtime seems linear in the max time shift allowed. For aligning two 1h tracks, with a max allowed time shift of 60s, it takes 6 minutes on a recent processor (Intel i5-5200U), and takes about 8GB of RAM. Using is for largeer time shifts such as 10 minutes will be quite expensive...

I also tested the quality of the results, to the extent sonic-analyzer allowed me - it can only report graphical results of the alignment analysis, but does not actually align the tracks.

(1)  2 identical audio tracks of a recorded concert, with a time-shift of about 15s between them.
Alignment seems perfect.

(2) 2 identical audio tracks of a recorded concert, except for a 30s hole filled with pink noise, with a time-shift of about 15s between them.
There are 1-2 second zones at the boundaries of the hole where the audio is wrongly aligned. This will be quite problematic when building a feature that allows mix and matching different versions of each passage.

(3) 2 audio tracks recorded from the same concert (left right channels from same device), except for a 30s hole filled with pink noise, with a time-shift of about 15s between them.
Sames issues as (2), no new issues.

(4) 2 audio tracks of the same concert, recorded with 2 different devices.
Throughout the match, it finds ratios of tempos that are as divergent as <0.8 or >1.2 a significant fraction of the time. This is pretty bad since a correct match should find a tempo ratio of 1 throughout the recording. Things can be improved using non-default parameters of lowering the cost of the diagonal to 1.5, and enabling the "path smoothing" feature, but tempo ratio still routinely hovers around 0.9 - 1.1.

(5) 2 recordings of two performances of the same composition, time shift of about 15s, and hole of about 30s.
Default parameters lead to big issues at boundaries around the hole (10s and 30s of incorrect matches). 
However, using non-default cost for diagonal again significantly improves the match by mostly fixing the boundaries around the hole. There is still a small issue with the first 0.5s of the performance that remains incorrectly matched.
I cannot really evaluate the match more than that, because sonic-analyzer just produces the graphs, but does not actually match the tracks.

My conclusion is that the match plugin cannot be used that easily, even for the simple case of 2 recordings of the same event, because of accuracy and performance. The former could be fixable by imposing stronger regularity of the path (e.g. piecewise linear). The latter might be harder.

I propose to start working on an algorithm and feature specific to the case of 2 recordings of the same event, which is an easier case to start with both in terms of algorithm and UI.
I also agree that we won't be able to align perfectly, in particular because of stereo. All we can do is best-effort given the sources. I will allow for piecewise linear ratios between frequencies (with additional regularity restrictions), to account for varying clock drifts.

Cheers,

--
Raphaël





On Mon, Jun 27, 2016 at 9:19 AM, Robert Hänggi <[hidden email]> wrote:
Hi
Incidentally, I've just stumbled over a real-life example where this
alignment would really be of great use to me.
I'm modelling a CD4 demodulation plug-in.
For the background see:
http://forum.audacityteam.org/viewtopic.php?p=307553#p307553
There are also two test (calibration) recordings in this specific post.

In essence, four tracks are embedded in a single stereo track.
The aim is to reverse-engineer what is in a hardware phono demodulator.
I can demodulate the signal, however, there are some difficulties in
proper aligning it with the base audio:
Base left=LFront + LBack (for normal stereo playback)
FM Left= LFront - LBack
(ditto for right)
Thus, I can't simply align them until they cancel.
What's more, the frequencies do not match exactly because we have RIAA
in combination with a noise reduction expander, a delay caused by the
low/high pass filter etc.

In summary, the alignment had to be very exact but at the same time
insensitive to noise, phase & amplitude deviations, and on and on...
For the moment, I will use cross-correlation and least square fitting
for certain "anchor" points.
I look forward to seeing the aligning feature someday implemented in
Audacity. Good luck.

Cheers
Robert


2016-06-27 2:38 GMT+02:00, Roger Dannenberg <[hidden email]>:
> Excellent point. Also, aligning anything to a stereo track will generate
> similar problems. I would suggest that if you're recording with multiple
> microphones and devices, you're guaranteed to hit phase and multiple
> source problems. In the spirit of the "principle of least surprise" I
> would expect an alignment effect to just do a reasonable job given the
> sources. E.g. if acoustic sources are spread over 10 meters (~30ms at
> the speed of sound), I'd hope individual sources would be aligned within
> 30ms. If there were a single source, I'd hope for much better.
>
> Another possibility is aligning to multiple tracks representing the same
> collection of sound sources recorded from different locations. It's
> subtly different from aligning to a single track.
>
> -Roger
>
> On 6/26/16 7:01 PM, James Crook wrote:
>> Something else to think about is what happens if you attempt to align
>> two mono tracks that happen actually to be left and right audio of a
>> stereo track.
>
>


------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel



------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Automatic time-syncing feature

Raphaël Marinier
Hi all,

After almost one year, I finally managed to spend some time on a
prototype implementation in Audacity, that aligns different recordings
of the same event.

You can see the code there:
https://github.com/RaphaelMarinier/audacity/commit/3276106c66c35e390c8169d0ac9bfab22e352567

The algorithm is as follows:
1. Summarize each track by computing summary values on a sliding time
window. Typically the window is 25ms.
2. Compute the cross-correlation between the summaries. This is done
in O(n log n) thanks to the FFT and convolution theorem.
3. Find the best shift from the cross-correlation function.
4. Split summaries into small chunks, and align them 1:1. This allows
detecting small clock speed differences between devices. It has been
tested successfully with 0.01% clock speed difference on 1h long
tracks.
5. Apply the shift, and resample one track if need be.

There are multiple algorithms and parameters that can be chosen at
each step, in particular regarding summarization of a window of audio
data, and finding the best peaks from the cross-correlation function.

I created a benchmark out of few recordings, with a few automated
audio transformations (low pass, high pass, forced clock speed
difference, etc..). With the best parameters, I get about 96% success
rate out of 150 audio pairs.
The run time is pretty reasonable, taking less than 10s for 1h audio
tracks on a recent laptop (plus resample time if it happens), memory
requirements are very small (on the order of 3MBs for two 1h tracks).

Would you like to have this in Audacity? If yes, what would be the
best way to integrate it? Note that we need to be able to shift tracks
by some offset, and resample them if need be. Does any plugin system
allow shifting the tracks without having to rewrite the samples?
Should this feature just be integrated as an ad-hoc internal audacity
feature (for example shown in the Tracks menu)?

There are of course some limitations that should still be addressed:
- Sync lock track group handling.
- Alignment uses left channel only. We might want to make this configurable.
- If the time drift is very small, we may want to avoid resampling tracks.
- We could use a much smaller time window in the second alignment
phase. This could make the alignment more precise, while still keeping
the algorithm fast.

The benchmarking code is completely ad-hoc, it would also be great to
find a way to run this kind of automated benchmarks in a uniform way
across Audacity code base (I guess other parts of Audacity could
benefit as well).

James, thanks for your algorithmic suggestions. For now I went the
route of using a mix of global and local cross-correlation.

Raphaël

On Thu, Jul 14, 2016 at 12:26 AM, Vaughan Johnson <[hidden email]> wrote:

> James: "This is extrapolating from back in old times, in the late 80's when
> I was analysing DNA and protein sequences..."
>
>
>
> Didn't know that!  I was doing similar work then, with Blackboard systems,
> on the PROTEAN project at Stanford KSL,
> http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19870014670.pdf  .
>
> Yes I've known about dynamic programming since about then. Good work, James
> -- I like your trick.
>
> -- V
>
> On Wed, Jul 13, 2016 at 3:02 PM, James Crook <[hidden email]> wrote:
>>
>> Sorry for the delay in getting back to you on this thread.
>>
>>
>> If you do use a dynamic programming approach, there is a neat trick I
>> invented (in context of DNA sequence matching) that caters for different
>> kinds of matching.  The trick is to run two 'match matrices' at the same
>> time, and have a penalty for switching between them.  This is excellent
>> where there is a mix of signal and noise, as in your test examples.  For
>> aligning noise you want a fairly sloppy not very precisely discriminating
>> comparison that is picking up broad characteristics.  What's great about
>> running two match matrices is that the algorithm naturally switches in to
>> using the best kind of matching for different sections.
>>
>>
>> On storage requirements, these can be reduced dramatically relative to
>> MATCH, even allowing large time shifts, by a divide and conquer approach.
>> Instead of allocating space length x max-shift you sample evenly and only
>> allocate space of k x max-shift for some small value of k such as 100.  The
>> cost is that you have to repeat the analysis log( length-of-sequence) times,
>> where log is to the base k.  So aligning to the nearest 10ms on two 1hr
>> sequences with a shift of up to 20 mins would take 50Mb storage (if one
>> match matrix) or 100Mb (with two in parallel), and the analysis would be
>> repeated 3 times.  Because you stay in cache in the analysis and write much
>> less to external memory it's a big net win both in storage and speed over a
>> single pass approach.
>>
>> I haven't written versions for sound.  This is extrapolating from back in
>> old times, in the late 80's when I was analysing DNA and protein sequences
>> on a PC with a fraction of the power and storage of modern PCs.  You had to
>> be inventive to get any decent performance at all.  This kind of trick can
>> pay off in a big way, even today.
>>
>> I can spell out in more detail if you might go down the dynamic
>> programming route, as I realise I have been a bit abbreviated in my
>> description here!
>>
>> --James.
>>
>>
>>
>>
>>
>> On 7/7/2016 11:00 PM, Raphaël Marinier wrote:
>>
>> Thanks for the information.
>>
>> I did some testing of the MATCH vamp plugin, running it via sonic
>> analyzer, which integrates it already.
>>
>> First of all, the algorithm is pretty expensive, and its runtime seems
>> linear in the max time shift allowed. For aligning two 1h tracks, with a max
>> allowed time shift of 60s, it takes 6 minutes on a recent processor (Intel
>> i5-5200U), and takes about 8GB of RAM. Using is for largeer time shifts such
>> as 10 minutes will be quite expensive...
>>
>> I also tested the quality of the results, to the extent sonic-analyzer
>> allowed me - it can only report graphical results of the alignment analysis,
>> but does not actually align the tracks.
>>
>> (1)  2 identical audio tracks of a recorded concert, with a time-shift of
>> about 15s between them.
>> Alignment seems perfect.
>>
>> (2) 2 identical audio tracks of a recorded concert, except for a 30s hole
>> filled with pink noise, with a time-shift of about 15s between them.
>> There are 1-2 second zones at the boundaries of the hole where the audio
>> is wrongly aligned. This will be quite problematic when building a feature
>> that allows mix and matching different versions of each passage.
>>
>> (3) 2 audio tracks recorded from the same concert (left right channels
>> from same device), except for a 30s hole filled with pink noise, with a
>> time-shift of about 15s between them.
>> Sames issues as (2), no new issues.
>>
>> (4) 2 audio tracks of the same concert, recorded with 2 different devices.
>> Throughout the match, it finds ratios of tempos that are as divergent as
>> <0.8 or >1.2 a significant fraction of the time. This is pretty bad since a
>> correct match should find a tempo ratio of 1 throughout the recording.
>> Things can be improved using non-default parameters of lowering the cost of
>> the diagonal to 1.5, and enabling the "path smoothing" feature, but tempo
>> ratio still routinely hovers around 0.9 - 1.1.
>>
>> (5) 2 recordings of two performances of the same composition, time shift
>> of about 15s, and hole of about 30s.
>> Default parameters lead to big issues at boundaries around the hole (10s
>> and 30s of incorrect matches).
>> However, using non-default cost for diagonal again significantly improves
>> the match by mostly fixing the boundaries around the hole. There is still a
>> small issue with the first 0.5s of the performance that remains incorrectly
>> matched.
>> I cannot really evaluate the match more than that, because sonic-analyzer
>> just produces the graphs, but does not actually match the tracks.
>>
>> My conclusion is that the match plugin cannot be used that easily, even
>> for the simple case of 2 recordings of the same event, because of accuracy
>> and performance. The former could be fixable by imposing stronger regularity
>> of the path (e.g. piecewise linear). The latter might be harder.
>>
>> I propose to start working on an algorithm and feature specific to the
>> case of 2 recordings of the same event, which is an easier case to start
>> with both in terms of algorithm and UI.
>> I also agree that we won't be able to align perfectly, in particular
>> because of stereo. All we can do is best-effort given the sources. I will
>> allow for piecewise linear ratios between frequencies (with additional
>> regularity restrictions), to account for varying clock drifts.
>>
>> Cheers,
>>
>> --
>> Raphaël
>>
>>
>>
>>
>>
>> On Mon, Jun 27, 2016 at 9:19 AM, Robert Hänggi <[hidden email]>
>> wrote:
>>>
>>> Hi
>>> Incidentally, I've just stumbled over a real-life example where this
>>> alignment would really be of great use to me.
>>> I'm modelling a CD4 demodulation plug-in.
>>> For the background see:
>>> http://forum.audacityteam.org/viewtopic.php?p=307553#p307553
>>> There are also two test (calibration) recordings in this specific post.
>>>
>>> In essence, four tracks are embedded in a single stereo track.
>>> The aim is to reverse-engineer what is in a hardware phono demodulator.
>>> I can demodulate the signal, however, there are some difficulties in
>>> proper aligning it with the base audio:
>>> Base left=LFront + LBack (for normal stereo playback)
>>> FM Left= LFront - LBack
>>> (ditto for right)
>>> Thus, I can't simply align them until they cancel.
>>> What's more, the frequencies do not match exactly because we have RIAA
>>> in combination with a noise reduction expander, a delay caused by the
>>> low/high pass filter etc.
>>>
>>> In summary, the alignment had to be very exact but at the same time
>>> insensitive to noise, phase & amplitude deviations, and on and on...
>>> For the moment, I will use cross-correlation and least square fitting
>>> for certain "anchor" points.
>>> I look forward to seeing the aligning feature someday implemented in
>>> Audacity. Good luck.
>>>
>>> Cheers
>>> Robert
>>>
>>>
>>> 2016-06-27 2:38 GMT+02:00, Roger Dannenberg <[hidden email]>:
>>> > Excellent point. Also, aligning anything to a stereo track will
>>> > generate
>>> > similar problems. I would suggest that if you're recording with
>>> > multiple
>>> > microphones and devices, you're guaranteed to hit phase and multiple
>>> > source problems. In the spirit of the "principle of least surprise" I
>>> > would expect an alignment effect to just do a reasonable job given the
>>> > sources. E.g. if acoustic sources are spread over 10 meters (~30ms at
>>> > the speed of sound), I'd hope individual sources would be aligned
>>> > within
>>> > 30ms. If there were a single source, I'd hope for much better.
>>> >
>>> > Another possibility is aligning to multiple tracks representing the
>>> > same
>>> > collection of sound sources recorded from different locations. It's
>>> > subtly different from aligning to a single track.
>>> >
>>> > -Roger
>>> >
>>> > On 6/26/16 7:01 PM, James Crook wrote:
>>> >> Something else to think about is what happens if you attempt to align
>>> >> two mono tracks that happen actually to be left and right audio of a
>>> >> stereo track.
>>> >
>>> >
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>> traffic
>> patterns at an interface-level. Reveals which users, apps, and protocols
>> are
>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>> planning
>> reports.http://sdm.link/zohodev2dev
>> _______________________________________________
>> audacity-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/audacity-devel
>>
>
>
> ------------------------------------------------------------------------------
> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
> patterns at an interface-level. Reveals which users, apps, and protocols are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning
> reports.http://sdm.link/zohodev2dev
> _______________________________________________
> audacity-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/audacity-devel
>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel
rbd
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Automatic time-syncing feature

rbd
Just a comment on implementation: Nyquist has high-quality resampling,
and unlike most implementations that simply resample with some scale
factor, Nyquist allows you to construct a mapping from one clock to
another, e.g. if the signal is S, you can compute S(f(t)) where f(t) is
any monotonically increasing function (for example, to do a simple
speed-up, you can use f(t) = t * 1.01). In the implementation, f(t) is
actually a Nyquist Sound, so for example, if you had an aligned points
every 10s, you could make a piece-wise linear function interpolating the
alignment points, thus compensating for clocks that are slowly changing
speed. Results are sub-sample accurate.

Some thoughts about alignment: What happens if you have recordings from
different locations recording sources from different locations? There
may be no perfect alignment, e.g. in one recording, source A might be
earlier than source B, but in the other source B is before source A.
Does this cause alignment to jump to the loudest source and introduce a
lot of timing jitter?

(By the way, Nyquist's phase-vocoder works the same way, but in this
case resampling would be the right operation.)

-Roger


On 6/10/17 6:51 AM, Raphaël Marinier wrote:

> Hi all,
>
> After almost one year, I finally managed to spend some time on a
> prototype implementation in Audacity, that aligns different recordings
> of the same event.
>
> You can see the code there:
> https://github.com/RaphaelMarinier/audacity/commit/3276106c66c35e390c8169d0ac9bfab22e352567
>
> The algorithm is as follows:
> 1. Summarize each track by computing summary values on a sliding time
> window. Typically the window is 25ms.
> 2. Compute the cross-correlation between the summaries. This is done
> in O(n log n) thanks to the FFT and convolution theorem.
> 3. Find the best shift from the cross-correlation function.
> 4. Split summaries into small chunks, and align them 1:1. This allows
> detecting small clock speed differences between devices. It has been
> tested successfully with 0.01% clock speed difference on 1h long
> tracks.
> 5. Apply the shift, and resample one track if need be.
>
> There are multiple algorithms and parameters that can be chosen at
> each step, in particular regarding summarization of a window of audio
> data, and finding the best peaks from the cross-correlation function.
>
> I created a benchmark out of few recordings, with a few automated
> audio transformations (low pass, high pass, forced clock speed
> difference, etc..). With the best parameters, I get about 96% success
> rate out of 150 audio pairs.
> The run time is pretty reasonable, taking less than 10s for 1h audio
> tracks on a recent laptop (plus resample time if it happens), memory
> requirements are very small (on the order of 3MBs for two 1h tracks).
>
> Would you like to have this in Audacity? If yes, what would be the
> best way to integrate it? Note that we need to be able to shift tracks
> by some offset, and resample them if need be. Does any plugin system
> allow shifting the tracks without having to rewrite the samples?
> Should this feature just be integrated as an ad-hoc internal audacity
> feature (for example shown in the Tracks menu)?
>
> There are of course some limitations that should still be addressed:
> - Sync lock track group handling.
> - Alignment uses left channel only. We might want to make this configurable.
> - If the time drift is very small, we may want to avoid resampling tracks.
> - We could use a much smaller time window in the second alignment
> phase. This could make the alignment more precise, while still keeping
> the algorithm fast.
>
> The benchmarking code is completely ad-hoc, it would also be great to
> find a way to run this kind of automated benchmarks in a uniform way
> across Audacity code base (I guess other parts of Audacity could
> benefit as well).
>
> James, thanks for your algorithmic suggestions. For now I went the
> route of using a mix of global and local cross-correlation.
>
> Raphaël
>
> On Thu, Jul 14, 2016 at 12:26 AM, Vaughan Johnson <[hidden email]> wrote:
>> James: "This is extrapolating from back in old times, in the late 80's when
>> I was analysing DNA and protein sequences..."
>>
>>
>>
>> Didn't know that!  I was doing similar work then, with Blackboard systems,
>> on the PROTEAN project at Stanford KSL,
>> http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19870014670.pdf  .
>>
>> Yes I've known about dynamic programming since about then. Good work, James
>> -- I like your trick.
>>
>> -- V
>>
>> On Wed, Jul 13, 2016 at 3:02 PM, James Crook <[hidden email]> wrote:
>>> Sorry for the delay in getting back to you on this thread.
>>>
>>>
>>> If you do use a dynamic programming approach, there is a neat trick I
>>> invented (in context of DNA sequence matching) that caters for different
>>> kinds of matching.  The trick is to run two 'match matrices' at the same
>>> time, and have a penalty for switching between them.  This is excellent
>>> where there is a mix of signal and noise, as in your test examples.  For
>>> aligning noise you want a fairly sloppy not very precisely discriminating
>>> comparison that is picking up broad characteristics.  What's great about
>>> running two match matrices is that the algorithm naturally switches in to
>>> using the best kind of matching for different sections.
>>>
>>>
>>> On storage requirements, these can be reduced dramatically relative to
>>> MATCH, even allowing large time shifts, by a divide and conquer approach.
>>> Instead of allocating space length x max-shift you sample evenly and only
>>> allocate space of k x max-shift for some small value of k such as 100.  The
>>> cost is that you have to repeat the analysis log( length-of-sequence) times,
>>> where log is to the base k.  So aligning to the nearest 10ms on two 1hr
>>> sequences with a shift of up to 20 mins would take 50Mb storage (if one
>>> match matrix) or 100Mb (with two in parallel), and the analysis would be
>>> repeated 3 times.  Because you stay in cache in the analysis and write much
>>> less to external memory it's a big net win both in storage and speed over a
>>> single pass approach.
>>>
>>> I haven't written versions for sound.  This is extrapolating from back in
>>> old times, in the late 80's when I was analysing DNA and protein sequences
>>> on a PC with a fraction of the power and storage of modern PCs.  You had to
>>> be inventive to get any decent performance at all.  This kind of trick can
>>> pay off in a big way, even today.
>>>
>>> I can spell out in more detail if you might go down the dynamic
>>> programming route, as I realise I have been a bit abbreviated in my
>>> description here!
>>>
>>> --James.
>>>
>>>
>>>
>>>
>>>
>>> On 7/7/2016 11:00 PM, Raphaël Marinier wrote:
>>>
>>> Thanks for the information.
>>>
>>> I did some testing of the MATCH vamp plugin, running it via sonic
>>> analyzer, which integrates it already.
>>>
>>> First of all, the algorithm is pretty expensive, and its runtime seems
>>> linear in the max time shift allowed. For aligning two 1h tracks, with a max
>>> allowed time shift of 60s, it takes 6 minutes on a recent processor (Intel
>>> i5-5200U), and takes about 8GB of RAM. Using is for largeer time shifts such
>>> as 10 minutes will be quite expensive...
>>>
>>> I also tested the quality of the results, to the extent sonic-analyzer
>>> allowed me - it can only report graphical results of the alignment analysis,
>>> but does not actually align the tracks.
>>>
>>> (1)  2 identical audio tracks of a recorded concert, with a time-shift of
>>> about 15s between them.
>>> Alignment seems perfect.
>>>
>>> (2) 2 identical audio tracks of a recorded concert, except for a 30s hole
>>> filled with pink noise, with a time-shift of about 15s between them.
>>> There are 1-2 second zones at the boundaries of the hole where the audio
>>> is wrongly aligned. This will be quite problematic when building a feature
>>> that allows mix and matching different versions of each passage.
>>>
>>> (3) 2 audio tracks recorded from the same concert (left right channels
>>> from same device), except for a 30s hole filled with pink noise, with a
>>> time-shift of about 15s between them.
>>> Sames issues as (2), no new issues.
>>>
>>> (4) 2 audio tracks of the same concert, recorded with 2 different devices.
>>> Throughout the match, it finds ratios of tempos that are as divergent as
>>> <0.8 or >1.2 a significant fraction of the time. This is pretty bad since a
>>> correct match should find a tempo ratio of 1 throughout the recording.
>>> Things can be improved using non-default parameters of lowering the cost of
>>> the diagonal to 1.5, and enabling the "path smoothing" feature, but tempo
>>> ratio still routinely hovers around 0.9 - 1.1.
>>>
>>> (5) 2 recordings of two performances of the same composition, time shift
>>> of about 15s, and hole of about 30s.
>>> Default parameters lead to big issues at boundaries around the hole (10s
>>> and 30s of incorrect matches).
>>> However, using non-default cost for diagonal again significantly improves
>>> the match by mostly fixing the boundaries around the hole. There is still a
>>> small issue with the first 0.5s of the performance that remains incorrectly
>>> matched.
>>> I cannot really evaluate the match more than that, because sonic-analyzer
>>> just produces the graphs, but does not actually match the tracks.
>>>
>>> My conclusion is that the match plugin cannot be used that easily, even
>>> for the simple case of 2 recordings of the same event, because of accuracy
>>> and performance. The former could be fixable by imposing stronger regularity
>>> of the path (e.g. piecewise linear). The latter might be harder.
>>>
>>> I propose to start working on an algorithm and feature specific to the
>>> case of 2 recordings of the same event, which is an easier case to start
>>> with both in terms of algorithm and UI.
>>> I also agree that we won't be able to align perfectly, in particular
>>> because of stereo. All we can do is best-effort given the sources. I will
>>> allow for piecewise linear ratios between frequencies (with additional
>>> regularity restrictions), to account for varying clock drifts.
>>>
>>> Cheers,
>>>
>>> --
>>> Raphaël
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jun 27, 2016 at 9:19 AM, Robert Hänggi <[hidden email]>
>>> wrote:
>>>> Hi
>>>> Incidentally, I've just stumbled over a real-life example where this
>>>> alignment would really be of great use to me.
>>>> I'm modelling a CD4 demodulation plug-in.
>>>> For the background see:
>>>> http://forum.audacityteam.org/viewtopic.php?p=307553#p307553
>>>> There are also two test (calibration) recordings in this specific post.
>>>>
>>>> In essence, four tracks are embedded in a single stereo track.
>>>> The aim is to reverse-engineer what is in a hardware phono demodulator.
>>>> I can demodulate the signal, however, there are some difficulties in
>>>> proper aligning it with the base audio:
>>>> Base left=LFront + LBack (for normal stereo playback)
>>>> FM Left= LFront - LBack
>>>> (ditto for right)
>>>> Thus, I can't simply align them until they cancel.
>>>> What's more, the frequencies do not match exactly because we have RIAA
>>>> in combination with a noise reduction expander, a delay caused by the
>>>> low/high pass filter etc.
>>>>
>>>> In summary, the alignment had to be very exact but at the same time
>>>> insensitive to noise, phase & amplitude deviations, and on and on...
>>>> For the moment, I will use cross-correlation and least square fitting
>>>> for certain "anchor" points.
>>>> I look forward to seeing the aligning feature someday implemented in
>>>> Audacity. Good luck.
>>>>
>>>> Cheers
>>>> Robert
>>>>
>>>>
>>>> 2016-06-27 2:38 GMT+02:00, Roger Dannenberg <[hidden email]>:
>>>>> Excellent point. Also, aligning anything to a stereo track will
>>>>> generate
>>>>> similar problems. I would suggest that if you're recording with
>>>>> multiple
>>>>> microphones and devices, you're guaranteed to hit phase and multiple
>>>>> source problems. In the spirit of the "principle of least surprise" I
>>>>> would expect an alignment effect to just do a reasonable job given the
>>>>> sources. E.g. if acoustic sources are spread over 10 meters (~30ms at
>>>>> the speed of sound), I'd hope individual sources would be aligned
>>>>> within
>>>>> 30ms. If there were a single source, I'd hope for much better.
>>>>>
>>>>> Another possibility is aligning to multiple tracks representing the
>>>>> same
>>>>> collection of sound sources recorded from different locations. It's
>>>>> subtly different from aligning to a single track.
>>>>>
>>>>> -Roger
>>>>>
>>>>> On 6/26/16 7:01 PM, James Crook wrote:
>>>>>> Something else to think about is what happens if you attempt to align
>>>>>> two mono tracks that happen actually to be left and right audio of a
>>>>>> stereo track.
>>>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>>> traffic
>>> patterns at an interface-level. Reveals which users, apps, and protocols
>>> are
>>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>>> planning
>>> reports.http://sdm.link/zohodev2dev
>>> _______________________________________________
>>> audacity-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/audacity-devel
>>>
>>
>> ------------------------------------------------------------------------------
>> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
>> patterns at an interface-level. Reveals which users, apps, and protocols are
>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>> planning
>> reports.http://sdm.link/zohodev2dev
>> _______________________________________________
>> audacity-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/audacity-devel
>>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> audacity-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/audacity-devel



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Automatic time-syncing feature

Robert Hänggi
On 10/06/2017, Roger Dannenberg <[hidden email]> wrote:

> Just a comment on implementation: Nyquist has high-quality resampling,
> and unlike most implementations that simply resample with some scale
> factor, Nyquist allows you to construct a mapping from one clock to
> another, e.g. if the signal is S, you can compute S(f(t)) where f(t) is
> any monotonically increasing function (for example, to do a simple
> speed-up, you can use f(t) = t * 1.01). In the implementation, f(t) is
> actually a Nyquist Sound, so for example, if you had an aligned points
> every 10s, you could make a piece-wise linear function interpolating the
> alignment points, thus compensating for clocks that are slowly changing
> speed. Results are sub-sample accurate.
>
I'm experiencing often that Audacity crashes when I use 'resample' or
'resamplev', especially when the selection is a bit longer or when the
(static) factor exceeds about 1:19.

Robert

> Some thoughts about alignment: What happens if you have recordings from
> different locations recording sources from different locations? There
> may be no perfect alignment, e.g. in one recording, source A might be
> earlier than source B, but in the other source B is before source A.
> Does this cause alignment to jump to the loudest source and introduce a
> lot of timing jitter?
>
> (By the way, Nyquist's phase-vocoder works the same way, but in this
> case resampling would be the right operation.)
>
> -Roger
>
>
> On 6/10/17 6:51 AM, Raphaël Marinier wrote:
>> Hi all,
>>
>> After almost one year, I finally managed to spend some time on a
>> prototype implementation in Audacity, that aligns different recordings
>> of the same event.
>>
>> You can see the code there:
>> https://github.com/RaphaelMarinier/audacity/commit/3276106c66c35e390c8169d0ac9bfab22e352567
>>
>> The algorithm is as follows:
>> 1. Summarize each track by computing summary values on a sliding time
>> window. Typically the window is 25ms.
>> 2. Compute the cross-correlation between the summaries. This is done
>> in O(n log n) thanks to the FFT and convolution theorem.
>> 3. Find the best shift from the cross-correlation function.
>> 4. Split summaries into small chunks, and align them 1:1. This allows
>> detecting small clock speed differences between devices. It has been
>> tested successfully with 0.01% clock speed difference on 1h long
>> tracks.
>> 5. Apply the shift, and resample one track if need be.
>>
>> There are multiple algorithms and parameters that can be chosen at
>> each step, in particular regarding summarization of a window of audio
>> data, and finding the best peaks from the cross-correlation function.
>>
>> I created a benchmark out of few recordings, with a few automated
>> audio transformations (low pass, high pass, forced clock speed
>> difference, etc..). With the best parameters, I get about 96% success
>> rate out of 150 audio pairs.
>> The run time is pretty reasonable, taking less than 10s for 1h audio
>> tracks on a recent laptop (plus resample time if it happens), memory
>> requirements are very small (on the order of 3MBs for two 1h tracks).
>>
>> Would you like to have this in Audacity? If yes, what would be the
>> best way to integrate it? Note that we need to be able to shift tracks
>> by some offset, and resample them if need be. Does any plugin system
>> allow shifting the tracks without having to rewrite the samples?
>> Should this feature just be integrated as an ad-hoc internal audacity
>> feature (for example shown in the Tracks menu)?
>>
>> There are of course some limitations that should still be addressed:
>> - Sync lock track group handling.
>> - Alignment uses left channel only. We might want to make this
>> configurable.
>> - If the time drift is very small, we may want to avoid resampling tracks.
>> - We could use a much smaller time window in the second alignment
>> phase. This could make the alignment more precise, while still keeping
>> the algorithm fast.
>>
>> The benchmarking code is completely ad-hoc, it would also be great to
>> find a way to run this kind of automated benchmarks in a uniform way
>> across Audacity code base (I guess other parts of Audacity could
>> benefit as well).
>>
>> James, thanks for your algorithmic suggestions. For now I went the
>> route of using a mix of global and local cross-correlation.
>>
>> Raphaël
>>
>> On Thu, Jul 14, 2016 at 12:26 AM, Vaughan Johnson <[hidden email]>
>> wrote:
>>> James: "This is extrapolating from back in old times, in the late 80's
>>> when
>>> I was analysing DNA and protein sequences..."
>>>
>>>
>>>
>>> Didn't know that!  I was doing similar work then, with Blackboard
>>> systems,
>>> on the PROTEAN project at Stanford KSL,
>>> http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19870014670.pdf  .
>>>
>>> Yes I've known about dynamic programming since about then. Good work,
>>> James
>>> -- I like your trick.
>>>
>>> -- V
>>>
>>> On Wed, Jul 13, 2016 at 3:02 PM, James Crook <[hidden email]> wrote:
>>>> Sorry for the delay in getting back to you on this thread.
>>>>
>>>>
>>>> If you do use a dynamic programming approach, there is a neat trick I
>>>> invented (in context of DNA sequence matching) that caters for different
>>>> kinds of matching.  The trick is to run two 'match matrices' at the same
>>>> time, and have a penalty for switching between them.  This is excellent
>>>> where there is a mix of signal and noise, as in your test examples.  For
>>>> aligning noise you want a fairly sloppy not very precisely
>>>> discriminating
>>>> comparison that is picking up broad characteristics.  What's great about
>>>> running two match matrices is that the algorithm naturally switches in
>>>> to
>>>> using the best kind of matching for different sections.
>>>>
>>>>
>>>> On storage requirements, these can be reduced dramatically relative to
>>>> MATCH, even allowing large time shifts, by a divide and conquer
>>>> approach.
>>>> Instead of allocating space length x max-shift you sample evenly and
>>>> only
>>>> allocate space of k x max-shift for some small value of k such as 100.
>>>> The
>>>> cost is that you have to repeat the analysis log( length-of-sequence)
>>>> times,
>>>> where log is to the base k.  So aligning to the nearest 10ms on two 1hr
>>>> sequences with a shift of up to 20 mins would take 50Mb storage (if one
>>>> match matrix) or 100Mb (with two in parallel), and the analysis would be
>>>> repeated 3 times.  Because you stay in cache in the analysis and write
>>>> much
>>>> less to external memory it's a big net win both in storage and speed
>>>> over a
>>>> single pass approach.
>>>>
>>>> I haven't written versions for sound.  This is extrapolating from back
>>>> in
>>>> old times, in the late 80's when I was analysing DNA and protein
>>>> sequences
>>>> on a PC with a fraction of the power and storage of modern PCs.  You had
>>>> to
>>>> be inventive to get any decent performance at all.  This kind of trick
>>>> can
>>>> pay off in a big way, even today.
>>>>
>>>> I can spell out in more detail if you might go down the dynamic
>>>> programming route, as I realise I have been a bit abbreviated in my
>>>> description here!
>>>>
>>>> --James.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 7/7/2016 11:00 PM, Raphaël Marinier wrote:
>>>>
>>>> Thanks for the information.
>>>>
>>>> I did some testing of the MATCH vamp plugin, running it via sonic
>>>> analyzer, which integrates it already.
>>>>
>>>> First of all, the algorithm is pretty expensive, and its runtime seems
>>>> linear in the max time shift allowed. For aligning two 1h tracks, with a
>>>> max
>>>> allowed time shift of 60s, it takes 6 minutes on a recent processor
>>>> (Intel
>>>> i5-5200U), and takes about 8GB of RAM. Using is for largeer time shifts
>>>> such
>>>> as 10 minutes will be quite expensive...
>>>>
>>>> I also tested the quality of the results, to the extent sonic-analyzer
>>>> allowed me - it can only report graphical results of the alignment
>>>> analysis,
>>>> but does not actually align the tracks.
>>>>
>>>> (1)  2 identical audio tracks of a recorded concert, with a time-shift
>>>> of
>>>> about 15s between them.
>>>> Alignment seems perfect.
>>>>
>>>> (2) 2 identical audio tracks of a recorded concert, except for a 30s
>>>> hole
>>>> filled with pink noise, with a time-shift of about 15s between them.
>>>> There are 1-2 second zones at the boundaries of the hole where the audio
>>>> is wrongly aligned. This will be quite problematic when building a
>>>> feature
>>>> that allows mix and matching different versions of each passage.
>>>>
>>>> (3) 2 audio tracks recorded from the same concert (left right channels
>>>> from same device), except for a 30s hole filled with pink noise, with a
>>>> time-shift of about 15s between them.
>>>> Sames issues as (2), no new issues.
>>>>
>>>> (4) 2 audio tracks of the same concert, recorded with 2 different
>>>> devices.
>>>> Throughout the match, it finds ratios of tempos that are as divergent as
>>>> <0.8 or >1.2 a significant fraction of the time. This is pretty bad
>>>> since a
>>>> correct match should find a tempo ratio of 1 throughout the recording.
>>>> Things can be improved using non-default parameters of lowering the cost
>>>> of
>>>> the diagonal to 1.5, and enabling the "path smoothing" feature, but
>>>> tempo
>>>> ratio still routinely hovers around 0.9 - 1.1.
>>>>
>>>> (5) 2 recordings of two performances of the same composition, time shift
>>>> of about 15s, and hole of about 30s.
>>>> Default parameters lead to big issues at boundaries around the hole (10s
>>>> and 30s of incorrect matches).
>>>> However, using non-default cost for diagonal again significantly
>>>> improves
>>>> the match by mostly fixing the boundaries around the hole. There is
>>>> still a
>>>> small issue with the first 0.5s of the performance that remains
>>>> incorrectly
>>>> matched.
>>>> I cannot really evaluate the match more than that, because
>>>> sonic-analyzer
>>>> just produces the graphs, but does not actually match the tracks.
>>>>
>>>> My conclusion is that the match plugin cannot be used that easily, even
>>>> for the simple case of 2 recordings of the same event, because of
>>>> accuracy
>>>> and performance. The former could be fixable by imposing stronger
>>>> regularity
>>>> of the path (e.g. piecewise linear). The latter might be harder.
>>>>
>>>> I propose to start working on an algorithm and feature specific to the
>>>> case of 2 recordings of the same event, which is an easier case to start
>>>> with both in terms of algorithm and UI.
>>>> I also agree that we won't be able to align perfectly, in particular
>>>> because of stereo. All we can do is best-effort given the sources. I
>>>> will
>>>> allow for piecewise linear ratios between frequencies (with additional
>>>> regularity restrictions), to account for varying clock drifts.
>>>>
>>>> Cheers,
>>>>
>>>> --
>>>> Raphaël
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jun 27, 2016 at 9:19 AM, Robert Hänggi <[hidden email]>
>>>> wrote:
>>>>> Hi
>>>>> Incidentally, I've just stumbled over a real-life example where this
>>>>> alignment would really be of great use to me.
>>>>> I'm modelling a CD4 demodulation plug-in.
>>>>> For the background see:
>>>>> http://forum.audacityteam.org/viewtopic.php?p=307553#p307553
>>>>> There are also two test (calibration) recordings in this specific post.
>>>>>
>>>>> In essence, four tracks are embedded in a single stereo track.
>>>>> The aim is to reverse-engineer what is in a hardware phono demodulator.
>>>>> I can demodulate the signal, however, there are some difficulties in
>>>>> proper aligning it with the base audio:
>>>>> Base left=LFront + LBack (for normal stereo playback)
>>>>> FM Left= LFront - LBack
>>>>> (ditto for right)
>>>>> Thus, I can't simply align them until they cancel.
>>>>> What's more, the frequencies do not match exactly because we have RIAA
>>>>> in combination with a noise reduction expander, a delay caused by the
>>>>> low/high pass filter etc.
>>>>>
>>>>> In summary, the alignment had to be very exact but at the same time
>>>>> insensitive to noise, phase & amplitude deviations, and on and on...
>>>>> For the moment, I will use cross-correlation and least square fitting
>>>>> for certain "anchor" points.
>>>>> I look forward to seeing the aligning feature someday implemented in
>>>>> Audacity. Good luck.
>>>>>
>>>>> Cheers
>>>>> Robert
>>>>>
>>>>>
>>>>> 2016-06-27 2:38 GMT+02:00, Roger Dannenberg <[hidden email]>:
>>>>>> Excellent point. Also, aligning anything to a stereo track will
>>>>>> generate
>>>>>> similar problems. I would suggest that if you're recording with
>>>>>> multiple
>>>>>> microphones and devices, you're guaranteed to hit phase and multiple
>>>>>> source problems. In the spirit of the "principle of least surprise" I
>>>>>> would expect an alignment effect to just do a reasonable job given the
>>>>>> sources. E.g. if acoustic sources are spread over 10 meters (~30ms at
>>>>>> the speed of sound), I'd hope individual sources would be aligned
>>>>>> within
>>>>>> 30ms. If there were a single source, I'd hope for much better.
>>>>>>
>>>>>> Another possibility is aligning to multiple tracks representing the
>>>>>> same
>>>>>> collection of sound sources recorded from different locations. It's
>>>>>> subtly different from aligning to a single track.
>>>>>>
>>>>>> -Roger
>>>>>>
>>>>>> On 6/26/16 7:01 PM, James Crook wrote:
>>>>>>> Something else to think about is what happens if you attempt to align
>>>>>>> two mono tracks that happen actually to be left and right audio of a
>>>>>>> stereo track.
>>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>>>> traffic
>>>> patterns at an interface-level. Reveals which users, apps, and protocols
>>>> are
>>>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>>>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>>>> planning
>>>> reports.http://sdm.link/zohodev2dev
>>>> _______________________________________________
>>>> audacity-devel mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/audacity-devel
>>>>
>>>
>>> ------------------------------------------------------------------------------
>>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>>> traffic
>>> patterns at an interface-level. Reveals which users, apps, and protocols
>>> are
>>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>>> planning
>>> reports.http://sdm.link/zohodev2dev
>>> _______________________________________________
>>> audacity-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/audacity-devel
>>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> audacity-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/audacity-devel
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> audacity-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/audacity-devel
>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Automatic time-syncing feature

Raphaël Marinier
In reply to this post by rbd
On Sat, Jun 10, 2017 at 4:54 PM, Roger Dannenberg <[hidden email]> wrote:

> Just a comment on implementation: Nyquist has high-quality resampling,
> and unlike most implementations that simply resample with some scale
> factor, Nyquist allows you to construct a mapping from one clock to
> another, e.g. if the signal is S, you can compute S(f(t)) where f(t) is
> any monotonically increasing function (for example, to do a simple
> speed-up, you can use f(t) = t * 1.01). In the implementation, f(t) is
> actually a Nyquist Sound, so for example, if you had an aligned points
> every 10s, you could make a piece-wise linear function interpolating the
> alignment points, thus compensating for clocks that are slowly changing
> speed. Results are sub-sample accurate.
> Some thoughts about alignment: What happens if you have recordings from
> different locations recording sources from different locations? There
> may be no perfect alignment, e.g. in one recording, source A might be
> earlier than source B, but in the other source B is before source A.
> Does this cause alignment to jump to the loudest source and introduce a
> lot of timing jitter?

I checked a few examples that have the property you mention. When doing local alignment (second phase of the algorithm) with very small windows (e.g. 1ms), I indeed see varying detected time differences at different positions in the two tracks. They seem to follow the loudest source. E.g. detected time differences hover between -20 and +20ms for two recordings ~15 meters apart, of sources ~10 meters apart (see this graph)

However, the algorithm performs relatively coarse alignment. We fit an affine function on those time differences vs track time, and just apply this affine transformation globally to one of the tracks.

As you mention, we could of course fit a piece-wise linear function instead, but do we want to introduce this kind of varying time-stretching that jumps to the loudest source?

Thanks,

Raphaël

>
> (By the way, Nyquist's phase-vocoder works the same way, but in this
> case resampling would be the right operation.)
>
> -Roger
>
>
> On 6/10/17 6:51 AM, Raphaël Marinier wrote:
>> Hi all,
>>
>> After almost one year, I finally managed to spend some time on a
>> prototype implementation in Audacity, that aligns different recordings
>> of the same event.
>>
>> You can see the code there:
>> https://github.com/RaphaelMarinier/audacity/commit/3276106c66c35e390c8169d0ac9bfab22e352567
>>
>> The algorithm is as follows:
>> 1. Summarize each track by computing summary values on a sliding time
>> window. Typically the window is 25ms.
>> 2. Compute the cross-correlation between the summaries. This is done
>> in O(n log n) thanks to the FFT and convolution theorem.
>> 3. Find the best shift from the cross-correlation function.
>> 4. Split summaries into small chunks, and align them 1:1. This allows
>> detecting small clock speed differences between devices. It has been
>> tested successfully with 0.01% clock speed difference on 1h long
>> tracks.
>> 5. Apply the shift, and resample one track if need be.
>>
>> There are multiple algorithms and parameters that can be chosen at
>> each step, in particular regarding summarization of a window of audio
>> data, and finding the best peaks from the cross-correlation function.
>>
>> I created a benchmark out of few recordings, with a few automated
>> audio transformations (low pass, high pass, forced clock speed
>> difference, etc..). With the best parameters, I get about 96% success
>> rate out of 150 audio pairs.
>> The run time is pretty reasonable, taking less than 10s for 1h audio
>> tracks on a recent laptop (plus resample time if it happens), memory
>> requirements are very small (on the order of 3MBs for two 1h tracks).
>>
>> Would you like to have this in Audacity? If yes, what would be the
>> best way to integrate it? Note that we need to be able to shift tracks
>> by some offset, and resample them if need be. Does any plugin system
>> allow shifting the tracks without having to rewrite the samples?
>> Should this feature just be integrated as an ad-hoc internal audacity
>> feature (for example shown in the Tracks menu)?
>>
>> There are of course some limitations that should still be addressed:
>> - Sync lock track group handling.
>> - Alignment uses left channel only. We might want to make this configurable.
>> - If the time drift is very small, we may want to avoid resampling tracks.
>> - We could use a much smaller time window in the second alignment
>> phase. This could make the alignment more precise, while still keeping
>> the algorithm fast.
>>
>> The benchmarking code is completely ad-hoc, it would also be great to
>> find a way to run this kind of automated benchmarks in a uniform way
>> across Audacity code base (I guess other parts of Audacity could
>> benefit as well).
>>
>> James, thanks for your algorithmic suggestions. For now I went the
>> route of using a mix of global and local cross-correlation.
>>
>> Raphaël
>>
>> On Thu, Jul 14, 2016 at 12:26 AM, Vaughan Johnson <[hidden email]> wrote:
>>> James: "This is extrapolating from back in old times, in the late 80's when
>>> I was analysing DNA and protein sequences..."
>>>
>>>
>>>
>>> Didn't know that!  I was doing similar work then, with Blackboard systems,
>>> on the PROTEAN project at Stanford KSL,
>>> http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19870014670.pdf  .
>>>
>>> Yes I've known about dynamic programming since about then. Good work, James
>>> -- I like your trick.
>>>
>>> -- V
>>>
>>> On Wed, Jul 13, 2016 at 3:02 PM, James Crook <[hidden email]> wrote:
>>>> Sorry for the delay in getting back to you on this thread.
>>>>
>>>>
>>>> If you do use a dynamic programming approach, there is a neat trick I
>>>> invented (in context of DNA sequence matching) that caters for different
>>>> kinds of matching.  The trick is to run two 'match matrices' at the same
>>>> time, and have a penalty for switching between them.  This is excellent
>>>> where there is a mix of signal and noise, as in your test examples.  For
>>>> aligning noise you want a fairly sloppy not very precisely discriminating
>>>> comparison that is picking up broad characteristics.  What's great about
>>>> running two match matrices is that the algorithm naturally switches in to
>>>> using the best kind of matching for different sections.
>>>>
>>>>
>>>> On storage requirements, these can be reduced dramatically relative to
>>>> MATCH, even allowing large time shifts, by a divide and conquer approach.
>>>> Instead of allocating space length x max-shift you sample evenly and only
>>>> allocate space of k x max-shift for some small value of k such as 100.  The
>>>> cost is that you have to repeat the analysis log( length-of-sequence) times,
>>>> where log is to the base k.  So aligning to the nearest 10ms on two 1hr
>>>> sequences with a shift of up to 20 mins would take 50Mb storage (if one
>>>> match matrix) or 100Mb (with two in parallel), and the analysis would be
>>>> repeated 3 times.  Because you stay in cache in the analysis and write much
>>>> less to external memory it's a big net win both in storage and speed over a
>>>> single pass approach.
>>>>
>>>> I haven't written versions for sound.  This is extrapolating from back in
>>>> old times, in the late 80's when I was analysing DNA and protein sequences
>>>> on a PC with a fraction of the power and storage of modern PCs.  You had to
>>>> be inventive to get any decent performance at all.  This kind of trick can
>>>> pay off in a big way, even today.
>>>>
>>>> I can spell out in more detail if you might go down the dynamic
>>>> programming route, as I realise I have been a bit abbreviated in my
>>>> description here!
>>>>
>>>> --James.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 7/7/2016 11:00 PM, Raphaël Marinier wrote:
>>>>
>>>> Thanks for the information.
>>>>
>>>> I did some testing of the MATCH vamp plugin, running it via sonic
>>>> analyzer, which integrates it already.
>>>>
>>>> First of all, the algorithm is pretty expensive, and its runtime seems
>>>> linear in the max time shift allowed. For aligning two 1h tracks, with a max
>>>> allowed time shift of 60s, it takes 6 minutes on a recent processor (Intel
>>>> i5-5200U), and takes about 8GB of RAM. Using is for largeer time shifts such
>>>> as 10 minutes will be quite expensive...
>>>>
>>>> I also tested the quality of the results, to the extent sonic-analyzer
>>>> allowed me - it can only report graphical results of the alignment analysis,
>>>> but does not actually align the tracks.
>>>>
>>>> (1)  2 identical audio tracks of a recorded concert, with a time-shift of
>>>> about 15s between them.
>>>> Alignment seems perfect.
>>>>
>>>> (2) 2 identical audio tracks of a recorded concert, except for a 30s hole
>>>> filled with pink noise, with a time-shift of about 15s between them.
>>>> There are 1-2 second zones at the boundaries of the hole where the audio
>>>> is wrongly aligned. This will be quite problematic when building a feature
>>>> that allows mix and matching different versions of each passage.
>>>>
>>>> (3) 2 audio tracks recorded from the same concert (left right channels
>>>> from same device), except for a 30s hole filled with pink noise, with a
>>>> time-shift of about 15s between them.
>>>> Sames issues as (2), no new issues.
>>>>
>>>> (4) 2 audio tracks of the same concert, recorded with 2 different devices.
>>>> Throughout the match, it finds ratios of tempos that are as divergent as
>>>> <0.8 or >1.2 a significant fraction of the time. This is pretty bad since a
>>>> correct match should find a tempo ratio of 1 throughout the recording.
>>>> Things can be improved using non-default parameters of lowering the cost of
>>>> the diagonal to 1.5, and enabling the "path smoothing" feature, but tempo
>>>> ratio still routinely hovers around 0.9 - 1.1.
>>>>
>>>> (5) 2 recordings of two performances of the same composition, time shift
>>>> of about 15s, and hole of about 30s.
>>>> Default parameters lead to big issues at boundaries around the hole (10s
>>>> and 30s of incorrect matches).
>>>> However, using non-default cost for diagonal again significantly improves
>>>> the match by mostly fixing the boundaries around the hole. There is still a
>>>> small issue with the first 0.5s of the performance that remains incorrectly
>>>> matched.
>>>> I cannot really evaluate the match more than that, because sonic-analyzer
>>>> just produces the graphs, but does not actually match the tracks.
>>>>
>>>> My conclusion is that the match plugin cannot be used that easily, even
>>>> for the simple case of 2 recordings of the same event, because of accuracy
>>>> and performance. The former could be fixable by imposing stronger regularity
>>>> of the path (e.g. piecewise linear). The latter might be harder.
>>>>
>>>> I propose to start working on an algorithm and feature specific to the
>>>> case of 2 recordings of the same event, which is an easier case to start
>>>> with both in terms of algorithm and UI.
>>>> I also agree that we won't be able to align perfectly, in particular
>>>> because of stereo. All we can do is best-effort given the sources. I will
>>>> allow for piecewise linear ratios between frequencies (with additional
>>>> regularity restrictions), to account for varying clock drifts.
>>>>
>>>> Cheers,
>>>>
>>>> --
>>>> Raphaël
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jun 27, 2016 at 9:19 AM, Robert Hänggi <[hidden email]>
>>>> wrote:
>>>>> Hi
>>>>> Incidentally, I've just stumbled over a real-life example where this
>>>>> alignment would really be of great use to me.
>>>>> I'm modelling a CD4 demodulation plug-in.
>>>>> For the background see:
>>>>> http://forum.audacityteam.org/viewtopic.php?p=307553#p307553
>>>>> There are also two test (calibration) recordings in this specific post.
>>>>>
>>>>> In essence, four tracks are embedded in a single stereo track.
>>>>> The aim is to reverse-engineer what is in a hardware phono demodulator.
>>>>> I can demodulate the signal, however, there are some difficulties in
>>>>> proper aligning it with the base audio:
>>>>> Base left=LFront + LBack (for normal stereo playback)
>>>>> FM Left= LFront - LBack
>>>>> (ditto for right)
>>>>> Thus, I can't simply align them until they cancel.
>>>>> What's more, the frequencies do not match exactly because we have RIAA
>>>>> in combination with a noise reduction expander, a delay caused by the
>>>>> low/high pass filter etc.
>>>>>
>>>>> In summary, the alignment had to be very exact but at the same time
>>>>> insensitive to noise, phase & amplitude deviations, and on and on...
>>>>> For the moment, I will use cross-correlation and least square fitting
>>>>> for certain "anchor" points.
>>>>> I look forward to seeing the aligning feature someday implemented in
>>>>> Audacity. Good luck.
>>>>>
>>>>> Cheers
>>>>> Robert
>>>>>
>>>>>
>>>>> 2016-06-27 2:38 GMT+02:00, Roger Dannenberg <[hidden email]>:
>>>>>> Excellent point. Also, aligning anything to a stereo track will
>>>>>> generate
>>>>>> similar problems. I would suggest that if you're recording with
>>>>>> multiple
>>>>>> microphones and devices, you're guaranteed to hit phase and multiple
>>>>>> source problems. In the spirit of the "principle of least surprise" I
>>>>>> would expect an alignment effect to just do a reasonable job given the
>>>>>> sources. E.g. if acoustic sources are spread over 10 meters (~30ms at
>>>>>> the speed of sound), I'd hope individual sources would be aligned
>>>>>> within
>>>>>> 30ms. If there were a single source, I'd hope for much better.
>>>>>>
>>>>>> Another possibility is aligning to multiple tracks representing the
>>>>>> same
>>>>>> collection of sound sources recorded from different locations. It's
>>>>>> subtly different from aligning to a single track.
>>>>>>
>>>>>> -Roger
>>>>>>
>>>>>> On 6/26/16 7:01 PM, James Crook wrote:
>>>>>>> Something else to think about is what happens if you attempt to align
>>>>>>> two mono tracks that happen actually to be left and right audio of a
>>>>>>> stereo track.
>>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>>>> traffic
>>>> patterns at an interface-level. Reveals which users, apps, and protocols
>>>> are
>>>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>>>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>>>> planning
>>>> reports.http://sdm.link/zohodev2dev
>>>> _______________________________________________
>>>> audacity-devel mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/audacity-devel
>>>>
>>>
>>> ------------------------------------------------------------------------------
>>> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
>>> patterns at an interface-level. Reveals which users, apps, and protocols are
>>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>>> planning
>>> reports.http://sdm.link/zohodev2dev
>>> _______________________________________________
>>> audacity-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/audacity-devel
>>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> audacity-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/audacity-devel
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> audacity-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/audacity-devel
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel
rbd
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Automatic time-syncing feature

rbd

Here are some random thoughts: It would make sense to compute alignment at many points and do some sort of smoothing. You might ask (and try to solve): What alignment function minimizes the sum-of-squares of alignment errors, considering only *plausible* alignment functions, i.e. those that could be produced by real crystal clocks? I'm not even sure of a reasonable model for clock drift, but one approach might be to just take the alignment function, treat it as a signal and low-pass it. The cut-off frequency would be very low, a tiny fraction of 1 Hz, and you'd have to be careful not to introduce phase shift or lag: The standard trick is to run an IIR filter over the signal, reverse it, filter it again, and reverse it again, so that phase shifts or lags cancel. I think getting the start and end of the signal right, i.e. initializing the filter state, are also tricky. Another approach might be least-squares regression to fit a higher-order polynomial rather than a line to the data. At least, it seems that linear regression over a bunch of alignment points would do a good job assuming clocks are stable and just running at slightly different speeds.

-Roger


On 6/17/17 6:49 PM, Raphaël Marinier wrote:
On Sat, Jun 10, 2017 at 4:54 PM, Roger Dannenberg <[hidden email]> wrote:
> Just a comment on implementation: Nyquist has high-quality resampling,
> and unlike most implementations that simply resample with some scale
> factor, Nyquist allows you to construct a mapping from one clock to
> another, e.g. if the signal is S, you can compute S(f(t)) where f(t) is
> any monotonically increasing function (for example, to do a simple
> speed-up, you can use f(t) = t * 1.01). In the implementation, f(t) is
> actually a Nyquist Sound, so for example, if you had an aligned points
> every 10s, you could make a piece-wise linear function interpolating the
> alignment points, thus compensating for clocks that are slowly changing
> speed. Results are sub-sample accurate.
> Some thoughts about alignment: What happens if you have recordings from
> different locations recording sources from different locations? There
> may be no perfect alignment, e.g. in one recording, source A might be
> earlier than source B, but in the other source B is before source A.
> Does this cause alignment to jump to the loudest source and introduce a
> lot of timing jitter?

I checked a few examples that have the property you mention. When doing local alignment (second phase of the algorithm) with very small windows (e.g. 1ms), I indeed see varying detected time differences at different positions in the two tracks. They seem to follow the loudest source. E.g. detected time differences hover between -20 and +20ms for two recordings ~15 meters apart, of sources ~10 meters apart (see this graph)

However, the algorithm performs relatively coarse alignment. We fit an affine function on those time differences vs track time, and just apply this affine transformation globally to one of the tracks.

As you mention, we could of course fit a piece-wise linear function instead, but do we want to introduce this kind of varying time-stretching that jumps to the loudest source?

Thanks,

Raphaël

>
> (By the way, Nyquist's phase-vocoder works the same way, but in this
> case resampling would be the right operation.)
>
> -Roger
>
>
> On 6/10/17 6:51 AM, Raphaël Marinier wrote:
>> Hi all,
>>
>> After almost one year, I finally managed to spend some time on a
>> prototype implementation in Audacity, that aligns different recordings
>> of the same event.
>>
>> You can see the code there:
>> https://github.com/RaphaelMarinier/audacity/commit/3276106c66c35e390c8169d0ac9bfab22e352567
>>
>> The algorithm is as follows:
>> 1. Summarize each track by computing summary values on a sliding time
>> window. Typically the window is 25ms.
>> 2. Compute the cross-correlation between the summaries. This is done
>> in O(n log n) thanks to the FFT and convolution theorem.
>> 3. Find the best shift from the cross-correlation function.
>> 4. Split summaries into small chunks, and align them 1:1. This allows
>> detecting small clock speed differences between devices. It has been
>> tested successfully with 0.01% clock speed difference on 1h long
>> tracks.
>> 5. Apply the shift, and resample one track if need be.
>>
>> There are multiple algorithms and parameters that can be chosen at
>> each step, in particular regarding summarization of a window of audio
>> data, and finding the best peaks from the cross-correlation function.
>>
>> I created a benchmark out of few recordings, with a few automated
>> audio transformations (low pass, high pass, forced clock speed
>> difference, etc..). With the best parameters, I get about 96% success
>> rate out of 150 audio pairs.
>> The run time is pretty reasonable, taking less than 10s for 1h audio
>> tracks on a recent laptop (plus resample time if it happens), memory
>> requirements are very small (on the order of 3MBs for two 1h tracks).
>>
>> Would you like to have this in Audacity? If yes, what would be the
>> best way to integrate it? Note that we need to be able to shift tracks
>> by some offset, and resample them if need be. Does any plugin system
>> allow shifting the tracks without having to rewrite the samples?
>> Should this feature just be integrated as an ad-hoc internal audacity
>> feature (for example shown in the Tracks menu)?
>>
>> There are of course some limitations that should still be addressed:
>> - Sync lock track group handling.
>> - Alignment uses left channel only. We might want to make this configurable.
>> - If the time drift is very small, we may want to avoid resampling tracks.
>> - We could use a much smaller time window in the second alignment
>> phase. This could make the alignment more precise, while still keeping
>> the algorithm fast.
>>
>> The benchmarking code is completely ad-hoc, it would also be great to
>> find a way to run this kind of automated benchmarks in a uniform way
>> across Audacity code base (I guess other parts of Audacity could
>> benefit as well).
>>
>> James, thanks for your algorithmic suggestions. For now I went the
>> route of using a mix of global and local cross-correlation.
>>
>> Raphaël
>>
>> On Thu, Jul 14, 2016 at 12:26 AM, Vaughan Johnson <[hidden email]> wrote:
>>> James: "This is extrapolating from back in old times, in the late 80's when
>>> I was analysing DNA and protein sequences..."
>>>
>>>
>>>
>>> Didn't know that!  I was doing similar work then, with Blackboard systems,
>>> on the PROTEAN project at Stanford KSL,
>>> http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19870014670.pdf  .
>>>
>>> Yes I've known about dynamic programming since about then. Good work, James
>>> -- I like your trick.
>>>
>>> -- V
>>>
>>> On Wed, Jul 13, 2016 at 3:02 PM, James Crook <[hidden email]> wrote:
>>>> Sorry for the delay in getting back to you on this thread.
>>>>
>>>>
>>>> If you do use a dynamic programming approach, there is a neat trick I
>>>> invented (in context of DNA sequence matching) that caters for different
>>>> kinds of matching.  The trick is to run two 'match matrices' at the same
>>>> time, and have a penalty for switching between them.  This is excellent
>>>> where there is a mix of signal and noise, as in your test examples.  For
>>>> aligning noise you want a fairly sloppy not very precisely discriminating
>>>> comparison that is picking up broad characteristics.  What's great about
>>>> running two match matrices is that the algorithm naturally switches in to
>>>> using the best kind of matching for different sections.
>>>>
>>>>
>>>> On storage requirements, these can be reduced dramatically relative to
>>>> MATCH, even allowing large time shifts, by a divide and conquer approach.
>>>> Instead of allocating space length x max-shift you sample evenly and only
>>>> allocate space of k x max-shift for some small value of k such as 100.  The
>>>> cost is that you have to repeat the analysis log( length-of-sequence) times,
>>>> where log is to the base k.  So aligning to the nearest 10ms on two 1hr
>>>> sequences with a shift of up to 20 mins would take 50Mb storage (if one
>>>> match matrix) or 100Mb (with two in parallel), and the analysis would be
>>>> repeated 3 times.  Because you stay in cache in the analysis and write much
>>>> less to external memory it's a big net win both in storage and speed over a
>>>> single pass approach.
>>>>
>>>> I haven't written versions for sound.  This is extrapolating from back in
>>>> old times, in the late 80's when I was analysing DNA and protein sequences
>>>> on a PC with a fraction of the power and storage of modern PCs.  You had to
>>>> be inventive to get any decent performance at all.  This kind of trick can
>>>> pay off in a big way, even today.
>>>>
>>>> I can spell out in more detail if you might go down the dynamic
>>>> programming route, as I realise I have been a bit abbreviated in my
>>>> description here!
>>>>
>>>> --James.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 7/7/2016 11:00 PM, Raphaël Marinier wrote:
>>>>
>>>> Thanks for the information.
>>>>
>>>> I did some testing of the MATCH vamp plugin, running it via sonic
>>>> analyzer, which integrates it already.
>>>>
>>>> First of all, the algorithm is pretty expensive, and its runtime seems
>>>> linear in the max time shift allowed. For aligning two 1h tracks, with a max
>>>> allowed time shift of 60s, it takes 6 minutes on a recent processor (Intel
>>>> i5-5200U), and takes about 8GB of RAM. Using is for largeer time shifts such
>>>> as 10 minutes will be quite expensive...
>>>>
>>>> I also tested the quality of the results, to the extent sonic-analyzer
>>>> allowed me - it can only report graphical results of the alignment analysis,
>>>> but does not actually align the tracks.
>>>>
>>>> (1)  2 identical audio tracks of a recorded concert, with a time-shift of
>>>> about 15s between them.
>>>> Alignment seems perfect.
>>>>
>>>> (2) 2 identical audio tracks of a recorded concert, except for a 30s hole
>>>> filled with pink noise, with a time-shift of about 15s between them.
>>>> There are 1-2 second zones at the boundaries of the hole where the audio
>>>> is wrongly aligned. This will be quite problematic when building a feature
>>>> that allows mix and matching different versions of each passage.
>>>>
>>>> (3) 2 audio tracks recorded from the same concert (left right channels
>>>> from same device), except for a 30s hole filled with pink noise, with a
>>>> time-shift of about 15s between them.
>>>> Sames issues as (2), no new issues.
>>>>
>>>> (4) 2 audio tracks of the same concert, recorded with 2 different devices.
>>>> Throughout the match, it finds ratios of tempos that are as divergent as
>>>> <0.8 or >1.2 a significant fraction of the time. This is pretty bad since a
>>>> correct match should find a tempo ratio of 1 throughout the recording.
>>>> Things can be improved using non-default parameters of lowering the cost of
>>>> the diagonal to 1.5, and enabling the "path smoothing" feature, but tempo
>>>> ratio still routinely hovers around 0.9 - 1.1.
>>>>
>>>> (5) 2 recordings of two performances of the same composition, time shift
>>>> of about 15s, and hole of about 30s.
>>>> Default parameters lead to big issues at boundaries around the hole (10s
>>>> and 30s of incorrect matches).
>>>> However, using non-default cost for diagonal again significantly improves
>>>> the match by mostly fixing the boundaries around the hole. There is still a
>>>> small issue with the first 0.5s of the performance that remains incorrectly
>>>> matched.
>>>> I cannot really evaluate the match more than that, because sonic-analyzer
>>>> just produces the graphs, but does not actually match the tracks.
>>>>
>>>> My conclusion is that the match plugin cannot be used that easily, even
>>>> for the simple case of 2 recordings of the same event, because of accuracy
>>>> and performance. The former could be fixable by imposing stronger regularity
>>>> of the path (e.g. piecewise linear). The latter might be harder.
>>>>
>>>> I propose to start working on an algorithm and feature specific to the
>>>> case of 2 recordings of the same event, which is an easier case to start
>>>> with both in terms of algorithm and UI.
>>>> I also agree that we won't be able to align perfectly, in particular
>>>> because of stereo. All we can do is best-effort given the sources. I will
>>>> allow for piecewise linear ratios between frequencies (with additional
>>>> regularity restrictions), to account for varying clock drifts.
>>>>
>>>> Cheers,
>>>>
>>>> --
>>>> Raphaël
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jun 27, 2016 at 9:19 AM, Robert Hänggi <[hidden email]>
>>>> wrote:
>>>>> Hi
>>>>> Incidentally, I've just stumbled over a real-life example where this
>>>>> alignment would really be of great use to me.
>>>>> I'm modelling a CD4 demodulation plug-in.
>>>>> For the background see:
>>>>> http://forum.audacityteam.org/viewtopic.php?p=307553#p307553
>>>>> There are also two test (calibration) recordings in this specific post.
>>>>>
>>>>> In essence, four tracks are embedded in a single stereo track.
>>>>> The aim is to reverse-engineer what is in a hardware phono demodulator.
>>>>> I can demodulate the signal, however, there are some difficulties in
>>>>> proper aligning it with the base audio:
>>>>> Base left=LFront + LBack (for normal stereo playback)
>>>>> FM Left= LFront - LBack
>>>>> (ditto for right)
>>>>> Thus, I can't simply align them until they cancel.
>>>>> What's more, the frequencies do not match exactly because we have RIAA
>>>>> in combination with a noise reduction expander, a delay caused by the
>>>>> low/high pass filter etc.
>>>>>
>>>>> In summary, the alignment had to be very exact but at the same time
>>>>> insensitive to noise, phase & amplitude deviations, and on and on...
>>>>> For the moment, I will use cross-correlation and least square fitting
>>>>> for certain "anchor" points.
>>>>> I look forward to seeing the aligning feature someday implemented in
>>>>> Audacity. Good luck.
>>>>>
>>>>> Cheers
>>>>> Robert
>>>>>
>>>>>
>>>>> 2016-06-27 2:38 GMT+02:00, Roger Dannenberg <[hidden email]>:
>>>>>> Excellent point. Also, aligning anything to a stereo track will
>>>>>> generate
>>>>>> similar problems. I would suggest that if you're recording with
>>>>>> multiple
>>>>>> microphones and devices, you're guaranteed to hit phase and multiple
>>>>>> source problems. In the spirit of the "principle of least surprise" I
>>>>>> would expect an alignment effect to just do a reasonable job given the
>>>>>> sources. E.g. if acoustic sources are spread over 10 meters (~30ms at
>>>>>> the speed of sound), I'd hope individual sources would be aligned
>>>>>> within
>>>>>> 30ms. If there were a single source, I'd hope for much better.
>>>>>>
>>>>>> Another possibility is aligning to multiple tracks representing the
>>>>>> same
>>>>>> collection of sound sources recorded from different locations. It's
>>>>>> subtly different from aligning to a single track.
>>>>>>
>>>>>> -Roger
>>>>>>
>>>>>> On 6/26/16 7:01 PM, James Crook wrote:
>>>>>>> Something else to think about is what happens if you attempt to align
>>>>>>> two mono tracks that happen actually to be left and right audio of a
>>>>>>> stereo track.
>>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>>>> traffic
>>>> patterns at an interface-level. Reveals which users, apps, and protocols
>>>> are
>>>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>>>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>>>> planning
>>>> reports.http://sdm.link/zohodev2dev
>>>> _______________________________________________
>>>> audacity-devel mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/audacity-devel
>>>>
>>>
>>> ------------------------------------------------------------------------------
>>> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
>>> patterns at an interface-level. Reveals which users, apps, and protocols are
>>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>>> planning
>>> reports.http://sdm.link/zohodev2dev
>>> _______________________________________________
>>> audacity-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/audacity-devel
>>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> audacity-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/audacity-devel
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> audacity-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/audacity-devel


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Automatic time-syncing feature

James Crook
In reply to this post by Raphaël Marinier
It all depends what you are doing the alignment for.

IF the assumption is that it is clock drift/different clocks, then you
need a different model for scoring alignments than if alignment is for
aligning multiple takes of the same song, and different again from
recordings of the same performance by different microphones.

For recording of the same performance by different microphones, you have
to have some model for the sources/reverb.  In a sense the alignment is
then deciding on both the most probable alignment and the most probable
model parameters at the same time.

--James.

On 6/17/2017 11:49 PM, Raphaël Marinier wrote:

> On Sat, Jun 10, 2017 at 4:54 PM, Roger Dannenberg <[hidden email]> wrote:
>> Just a comment on implementation: Nyquist has high-quality resampling,
>> and unlike most implementations that simply resample with some scale
>> factor, Nyquist allows you to construct a mapping from one clock to
>> another, e.g. if the signal is S, you can compute S(f(t)) where f(t) is
>> any monotonically increasing function (for example, to do a simple
>> speed-up, you can use f(t) = t * 1.01). In the implementation, f(t) is
>> actually a Nyquist Sound, so for example, if you had an aligned points
>> every 10s, you could make a piece-wise linear function interpolating the
>> alignment points, thus compensating for clocks that are slowly changing
>> speed. Results are sub-sample accurate.
>> Some thoughts about alignment: What happens if you have recordings from
>> different locations recording sources from different locations? There
>> may be no perfect alignment, e.g. in one recording, source A might be
>> earlier than source B, but in the other source B is before source A.
>> Does this cause alignment to jump to the loudest source and introduce a
>> lot of timing jitter?
> I checked a few examples that have the property you mention. When doing
> local alignment (second phase of the algorithm) with very small windows
> (e.g. 1ms), I indeed see varying detected time differences at different
> positions in the two tracks. They seem to follow the loudest source. E.g.
> detected time differences hover between -20 and +20ms for two recordings
> ~15 meters apart, of sources ~10 meters apart (see this graph
> <https://drive.google.com/file/d/0B7V5I4sAuUdfNDNsaWYyZGFQeWM/view?usp=sharing>
> )
>
> However, the algorithm performs relatively coarse alignment. We fit an
> affine function on those time differences vs track time, and just apply
> this affine transformation globally to one of the tracks.
>
> As you mention, we could of course fit a piece-wise linear function
> instead, but do we want to introduce this kind of varying time-stretching
> that jumps to the loudest source?
>
> Thanks,
>
> Raphaël
>
>> (By the way, Nyquist's phase-vocoder works the same way, but in this
>> case resampling would be the right operation.)
>>
>> -Roger
>>
>>
>> On 6/10/17 6:51 AM, Raphaël Marinier wrote:
>>> Hi all,
>>>
>>> After almost one year, I finally managed to spend some time on a
>>> prototype implementation in Audacity, that aligns different recordings
>>> of the same event.
>>>
>>> You can see the code there:
>>>
> https://github.com/RaphaelMarinier/audacity/commit/3276106c66c35e390c8169d0ac9bfab22e352567
>>> The algorithm is as follows:
>>> 1. Summarize each track by computing summary values on a sliding time
>>> window. Typically the window is 25ms.
>>> 2. Compute the cross-correlation between the summaries. This is done
>>> in O(n log n) thanks to the FFT and convolution theorem.
>>> 3. Find the best shift from the cross-correlation function.
>>> 4. Split summaries into small chunks, and align them 1:1. This allows
>>> detecting small clock speed differences between devices. It has been
>>> tested successfully with 0.01% clock speed difference on 1h long
>>> tracks.
>>> 5. Apply the shift, and resample one track if need be.
>>>
>>> There are multiple algorithms and parameters that can be chosen at
>>> each step, in particular regarding summarization of a window of audio
>>> data, and finding the best peaks from the cross-correlation function.
>>>
>>> I created a benchmark out of few recordings, with a few automated
>>> audio transformations (low pass, high pass, forced clock speed
>>> difference, etc..). With the best parameters, I get about 96% success
>>> rate out of 150 audio pairs.
>>> The run time is pretty reasonable, taking less than 10s for 1h audio
>>> tracks on a recent laptop (plus resample time if it happens), memory
>>> requirements are very small (on the order of 3MBs for two 1h tracks).
>>>
>>> Would you like to have this in Audacity? If yes, what would be the
>>> best way to integrate it? Note that we need to be able to shift tracks
>>> by some offset, and resample them if need be. Does any plugin system
>>> allow shifting the tracks without having to rewrite the samples?
>>> Should this feature just be integrated as an ad-hoc internal audacity
>>> feature (for example shown in the Tracks menu)?
>>>
>>> There are of course some limitations that should still be addressed:
>>> - Sync lock track group handling.
>>> - Alignment uses left channel only. We might want to make this
> configurable.
>>> - If the time drift is very small, we may want to avoid resampling
> tracks.
>>> - We could use a much smaller time window in the second alignment
>>> phase. This could make the alignment more precise, while still keeping
>>> the algorithm fast.
>>>
>>> The benchmarking code is completely ad-hoc, it would also be great to
>>> find a way to run this kind of automated benchmarks in a uniform way
>>> across Audacity code base (I guess other parts of Audacity could
>>> benefit as well).
>>>
>>> James, thanks for your algorithmic suggestions. For now I went the
>>> route of using a mix of global and local cross-correlation.
>>>
>>> Raphaël
>>>
>>> On Thu, Jul 14, 2016 at 12:26 AM, Vaughan Johnson <[hidden email]>
> wrote:
>>>> James: "This is extrapolating from back in old times, in the late 80's
> when
>>>> I was analysing DNA and protein sequences..."
>>>>
>>>>
>>>>
>>>> Didn't know that!  I was doing similar work then, with Blackboard
> systems,
>>>> on the PROTEAN project at Stanford KSL,
>>>> http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19870014670.pdf  .
>>>>
>>>> Yes I've known about dynamic programming since about then. Good work,
> James
>>>> -- I like your trick.
>>>>
>>>> -- V
>>>>
>>>> On Wed, Jul 13, 2016 at 3:02 PM, James Crook <[hidden email]> wrote:
>>>>> Sorry for the delay in getting back to you on this thread.
>>>>>
>>>>>
>>>>> If you do use a dynamic programming approach, there is a neat trick I
>>>>> invented (in context of DNA sequence matching) that caters for
> different
>>>>> kinds of matching.  The trick is to run two 'match matrices' at the
> same
>>>>> time, and have a penalty for switching between them.  This is excellent
>>>>> where there is a mix of signal and noise, as in your test examples.
> For
>>>>> aligning noise you want a fairly sloppy not very precisely
> discriminating
>>>>> comparison that is picking up broad characteristics.  What's great
> about
>>>>> running two match matrices is that the algorithm naturally switches in
> to
>>>>> using the best kind of matching for different sections.
>>>>>
>>>>>
>>>>> On storage requirements, these can be reduced dramatically relative to
>>>>> MATCH, even allowing large time shifts, by a divide and conquer
> approach.
>>>>> Instead of allocating space length x max-shift you sample evenly and
> only
>>>>> allocate space of k x max-shift for some small value of k such as
> 100.  The
>>>>> cost is that you have to repeat the analysis log( length-of-sequence)
> times,
>>>>> where log is to the base k.  So aligning to the nearest 10ms on two 1hr
>>>>> sequences with a shift of up to 20 mins would take 50Mb storage (if one
>>>>> match matrix) or 100Mb (with two in parallel), and the analysis would
> be
>>>>> repeated 3 times.  Because you stay in cache in the analysis and write
> much
>>>>> less to external memory it's a big net win both in storage and speed
> over a
>>>>> single pass approach.
>>>>>
>>>>> I haven't written versions for sound.  This is extrapolating from back
> in
>>>>> old times, in the late 80's when I was analysing DNA and protein
> sequences
>>>>> on a PC with a fraction of the power and storage of modern PCs.  You
> had to
>>>>> be inventive to get any decent performance at all.  This kind of trick
> can
>>>>> pay off in a big way, even today.
>>>>>
>>>>> I can spell out in more detail if you might go down the dynamic
>>>>> programming route, as I realise I have been a bit abbreviated in my
>>>>> description here!
>>>>>
>>>>> --James.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 7/7/2016 11:00 PM, Raphaël Marinier wrote:
>>>>>
>>>>> Thanks for the information.
>>>>>
>>>>> I did some testing of the MATCH vamp plugin, running it via sonic
>>>>> analyzer, which integrates it already.
>>>>>
>>>>> First of all, the algorithm is pretty expensive, and its runtime seems
>>>>> linear in the max time shift allowed. For aligning two 1h tracks, with
> a max
>>>>> allowed time shift of 60s, it takes 6 minutes on a recent processor
> (Intel
>>>>> i5-5200U), and takes about 8GB of RAM. Using is for largeer time
> shifts such
>>>>> as 10 minutes will be quite expensive...
>>>>>
>>>>> I also tested the quality of the results, to the extent sonic-analyzer
>>>>> allowed me - it can only report graphical results of the alignment
> analysis,
>>>>> but does not actually align the tracks.
>>>>>
>>>>> (1)  2 identical audio tracks of a recorded concert, with a time-shift
> of
>>>>> about 15s between them.
>>>>> Alignment seems perfect.
>>>>>
>>>>> (2) 2 identical audio tracks of a recorded concert, except for a 30s
> hole
>>>>> filled with pink noise, with a time-shift of about 15s between them.
>>>>> There are 1-2 second zones at the boundaries of the hole where the
> audio
>>>>> is wrongly aligned. This will be quite problematic when building a
> feature
>>>>> that allows mix and matching different versions of each passage.
>>>>>
>>>>> (3) 2 audio tracks recorded from the same concert (left right channels
>>>>> from same device), except for a 30s hole filled with pink noise, with a
>>>>> time-shift of about 15s between them.
>>>>> Sames issues as (2), no new issues.
>>>>>
>>>>> (4) 2 audio tracks of the same concert, recorded with 2 different
> devices.
>>>>> Throughout the match, it finds ratios of tempos that are as divergent
> as
>>>>> <0.8 or >1.2 a significant fraction of the time. This is pretty bad
> since a
>>>>> correct match should find a tempo ratio of 1 throughout the recording.
>>>>> Things can be improved using non-default parameters of lowering the
> cost of
>>>>> the diagonal to 1.5, and enabling the "path smoothing" feature, but
> tempo
>>>>> ratio still routinely hovers around 0.9 - 1.1.
>>>>>
>>>>> (5) 2 recordings of two performances of the same composition, time
> shift
>>>>> of about 15s, and hole of about 30s.
>>>>> Default parameters lead to big issues at boundaries around the hole
> (10s
>>>>> and 30s of incorrect matches).
>>>>> However, using non-default cost for diagonal again significantly
> improves
>>>>> the match by mostly fixing the boundaries around the hole. There is
> still a
>>>>> small issue with the first 0.5s of the performance that remains
> incorrectly
>>>>> matched.
>>>>> I cannot really evaluate the match more than that, because
> sonic-analyzer
>>>>> just produces the graphs, but does not actually match the tracks.
>>>>>
>>>>> My conclusion is that the match plugin cannot be used that easily, even
>>>>> for the simple case of 2 recordings of the same event, because of
> accuracy
>>>>> and performance. The former could be fixable by imposing stronger
> regularity
>>>>> of the path (e.g. piecewise linear). The latter might be harder.
>>>>>
>>>>> I propose to start working on an algorithm and feature specific to the
>>>>> case of 2 recordings of the same event, which is an easier case to
> start
>>>>> with both in terms of algorithm and UI.
>>>>> I also agree that we won't be able to align perfectly, in particular
>>>>> because of stereo. All we can do is best-effort given the sources. I
> will
>>>>> allow for piecewise linear ratios between frequencies (with additional
>>>>> regularity restrictions), to account for varying clock drifts.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> --
>>>>> Raphaël
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jun 27, 2016 at 9:19 AM, Robert Hänggi <[hidden email]
>>>>> wrote:
>>>>>> Hi
>>>>>> Incidentally, I've just stumbled over a real-life example where this
>>>>>> alignment would really be of great use to me.
>>>>>> I'm modelling a CD4 demodulation plug-in.
>>>>>> For the background see:
>>>>>> http://forum.audacityteam.org/viewtopic.php?p=307553#p307553
>>>>>> There are also two test (calibration) recordings in this specific
> post.
>>>>>> In essence, four tracks are embedded in a single stereo track.
>>>>>> The aim is to reverse-engineer what is in a hardware phono
> demodulator.
>>>>>> I can demodulate the signal, however, there are some difficulties in
>>>>>> proper aligning it with the base audio:
>>>>>> Base left=LFront + LBack (for normal stereo playback)
>>>>>> FM Left= LFront - LBack
>>>>>> (ditto for right)
>>>>>> Thus, I can't simply align them until they cancel.
>>>>>> What's more, the frequencies do not match exactly because we have RIAA
>>>>>> in combination with a noise reduction expander, a delay caused by the
>>>>>> low/high pass filter etc.
>>>>>>
>>>>>> In summary, the alignment had to be very exact but at the same time
>>>>>> insensitive to noise, phase & amplitude deviations, and on and on...
>>>>>> For the moment, I will use cross-correlation and least square fitting
>>>>>> for certain "anchor" points.
>>>>>> I look forward to seeing the aligning feature someday implemented in
>>>>>> Audacity. Good luck.
>>>>>>
>>>>>> Cheers
>>>>>> Robert
>>>>>>
>>>>>>
>>>>>> 2016-06-27 2:38 GMT+02:00, Roger Dannenberg <[hidden email]>:
>>>>>>> Excellent point. Also, aligning anything to a stereo track will
>>>>>>> generate
>>>>>>> similar problems. I would suggest that if you're recording with
>>>>>>> multiple
>>>>>>> microphones and devices, you're guaranteed to hit phase and multiple
>>>>>>> source problems. In the spirit of the "principle of least surprise" I
>>>>>>> would expect an alignment effect to just do a reasonable job given
> the
>>>>>>> sources. E.g. if acoustic sources are spread over 10 meters (~30ms at
>>>>>>> the speed of sound), I'd hope individual sources would be aligned
>>>>>>> within
>>>>>>> 30ms. If there were a single source, I'd hope for much better.
>>>>>>>
>>>>>>> Another possibility is aligning to multiple tracks representing the
>>>>>>> same
>>>>>>> collection of sound sources recorded from different locations. It's
>>>>>>> subtly different from aligning to a single track.
>>>>>>>
>>>>>>> -Roger
>>>>>>>
>>>>>>> On 6/26/16 7:01 PM, James Crook wrote:
>>>>>>>> Something else to think about is what happens if you attempt to
> align
>>>>>>>> two mono tracks that happen actually to be left and right audio of a
>>>>>>>> stereo track.
>>>>>
>>>>>
>>>>>
> ------------------------------------------------------------------------------
>>>>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>>>>> traffic
>>>>> patterns at an interface-level. Reveals which users, apps, and
> protocols
>>>>> are
>>>>> consuming the most bandwidth. Provides multi-vendor support for
> NetFlow,
>>>>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>>>>> planning
>>>>> reports.http://sdm.link/zohodev2dev
>>>>> _______________________________________________
>>>>> audacity-devel mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/audacity-devel
>>>>>
>>>>
> ------------------------------------------------------------------------------
>>>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
> traffic
>>>> patterns at an interface-level. Reveals which users, apps, and
> protocols are
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Automatic time-syncing feature

Raphaël Marinier
In reply to this post by rbd
On Sun, Jun 18, 2017 at 4:57 AM, Roger Dannenberg <[hidden email]> wrote:

Here are some random thoughts: It would make sense to compute alignment at many points and do some sort of smoothing. You might ask (and try to solve): What alignment function minimizes the sum-of-squares of alignment errors, considering only *plausible* alignment functions, i.e. those that could be produced by real crystal clocks? I'm not even sure of a reasonable model for clock drift, but one approach might be to just take the alignment function, treat it as a signal and low-pass it. The cut-off frequency would be very low, a tiny fraction of 1 Hz, and you'd have to be careful not to introduce phase shift or lag: The standard trick is to run an IIR filter over the signal, reverse it, filter it again, and reverse it again, so that phase shifts or lags cancel. I think getting the start and end of the signal right, i.e. initializing the filter state, are also tricky. Another approach might be least-squares regression to fit a higher-order polynomial rather than a line to the data. At least, it seems that linear regression over a bunch of alignment points would do a good job assuming clocks are stable and just running at slightly different speeds.


Note that the current algorithm already does a linear regression over multiple alignment points.This indeed corrects slightly different clock speeds, assuming the speed differences are stable.

I'll go further and fit continuous piece-wise linear functions, to catch unstable clock differences. I'll place the knots of the function ~10 minutes appart.

When we evaluate the alignment at many positions, some detected time differences will be completely wrong. It can be because the algorithm did not succeed (e.g. the time window considered is mostly filled with silence), or because the two tracks only partially overlap. The model has to be robust to those outlier and non-sensical values. The more complex the function we fit, the harder it is, so I am very in favor of keeping the fitting function as simple as possible.

Regarding the resampling, I've seen that soxr, currently used by Audacity, supports piece-wise linear functions, and it seems straightforward to use from Audacity's code. If we wanted to do this with Nyquist, I'd have to execute nyquist instructions from my code, which seems quite more complicated. Is there an easy to do it from code that I missed?

Raphaël


-Roger


On 6/17/17 6:49 PM, Raphaël Marinier wrote:
On Sat, Jun 10, 2017 at 4:54 PM, Roger Dannenberg <[hidden email]> wrote:
> Just a comment on implementation: Nyquist has high-quality resampling,
> and unlike most implementations that simply resample with some scale
> factor, Nyquist allows you to construct a mapping from one clock to
> another, e.g. if the signal is S, you can compute S(f(t)) where f(t) is
> any monotonically increasing function (for example, to do a simple
> speed-up, you can use f(t) = t * 1.01). In the implementation, f(t) is
> actually a Nyquist Sound, so for example, if you had an aligned points
> every 10s, you could make a piece-wise linear function interpolating the
> alignment points, thus compensating for clocks that are slowly changing
> speed. Results are sub-sample accurate.
> Some thoughts about alignment: What happens if you have recordings from
> different locations recording sources from different locations? There
> may be no perfect alignment, e.g. in one recording, source A might be
> earlier than source B, but in the other source B is before source A.
> Does this cause alignment to jump to the loudest source and introduce a
> lot of timing jitter?

I checked a few examples that have the property you mention. When doing local alignment (second phase of the algorithm) with very small windows (e.g. 1ms), I indeed see varying detected time differences at different positions in the two tracks. They seem to follow the loudest source. E.g. detected time differences hover between -20 and +20ms for two recordings ~15 meters apart, of sources ~10 meters apart (see this graph)

However, the algorithm performs relatively coarse alignment. We fit an affine function on those time differences vs track time, and just apply this affine transformation globally to one of the tracks.

As you mention, we could of course fit a piece-wise linear function instead, but do we want to introduce this kind of varying time-stretching that jumps to the loudest source?

Thanks,

Raphaël

>
> (By the way, Nyquist's phase-vocoder works the same way, but in this
> case resampling would be the right operation.)
>
> -Roger
>
>
> On 6/10/17 6:51 AM, Raphaël Marinier wrote:
>> Hi all,
>>
>> After almost one year, I finally managed to spend some time on a
>> prototype implementation in Audacity, that aligns different recordings
>> of the same event.
>>
>> You can see the code there:
>> https://github.com/RaphaelMarinier/audacity/commit/3276106c66c35e390c8169d0ac9bfab22e352567
>>
>> The algorithm is as follows:
>> 1. Summarize each track by computing summary values on a sliding time
>> window. Typically the window is 25ms.
>> 2. Compute the cross-correlation between the summaries. This is done
>> in O(n log n) thanks to the FFT and convolution theorem.
>> 3. Find the best shift from the cross-correlation function.
>> 4. Split summaries into small chunks, and align them 1:1. This allows
>> detecting small clock speed differences between devices. It has been
>> tested successfully with 0.01% clock speed difference on 1h long
>> tracks.
>> 5. Apply the shift, and resample one track if need be.
>>
>> There are multiple algorithms and parameters that can be chosen at
>> each step, in particular regarding summarization of a window of audio
>> data, and finding the best peaks from the cross-correlation function.
>>
>> I created a benchmark out of few recordings, with a few automated
>> audio transformations (low pass, high pass, forced clock speed
>> difference, etc..). With the best parameters, I get about 96% success
>> rate out of 150 audio pairs.
>> The run time is pretty reasonable, taking less than 10s for 1h audio
>> tracks on a recent laptop (plus resample time if it happens), memory
>> requirements are very small (on the order of 3MBs for two 1h tracks).
>>
>> Would you like to have this in Audacity? If yes, what would be the
>> best way to integrate it? Note that we need to be able to shift tracks
>> by some offset, and resample them if need be. Does any plugin system
>> allow shifting the tracks without having to rewrite the samples?
>> Should this feature just be integrated as an ad-hoc internal audacity
>> feature (for example shown in the Tracks menu)?
>>
>> There are of course some limitations that should still be addressed:
>> - Sync lock track group handling.
>> - Alignment uses left channel only. We might want to make this configurable.
>> - If the time drift is very small, we may want to avoid resampling tracks.
>> - We could use a much smaller time window in the second alignment
>> phase. This could make the alignment more precise, while still keeping
>> the algorithm fast.
>>
>> The benchmarking code is completely ad-hoc, it would also be great to
>> find a way to run this kind of automated benchmarks in a uniform way
>> across Audacity code base (I guess other parts of Audacity could
>> benefit as well).
>>
>> James, thanks for your algorithmic suggestions. For now I went the
>> route of using a mix of global and local cross-correlation.
>>
>> Raphaël
>>
>> On Thu, Jul 14, 2016 at 12:26 AM, Vaughan Johnson <[hidden email]> wrote:
>>> James: "This is extrapolating from back in old times, in the late 80's when
>>> I was analysing DNA and protein sequences..."
>>>
>>>
>>>
>>> Didn't know that!  I was doing similar work then, with Blackboard systems,
>>> on the PROTEAN project at Stanford KSL,
>>> http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19870014670.pdf  .
>>>
>>> Yes I've known about dynamic programming since about then. Good work, James
>>> -- I like your trick.
>>>
>>> -- V
>>>
>>> On Wed, Jul 13, 2016 at 3:02 PM, James Crook <[hidden email]> wrote:
>>>> Sorry for the delay in getting back to you on this thread.
>>>>
>>>>
>>>> If you do use a dynamic programming approach, there is a neat trick I
>>>> invented (in context of DNA sequence matching) that caters for different
>>>> kinds of matching.  The trick is to run two 'match matrices' at the same
>>>> time, and have a penalty for switching between them.  This is excellent
>>>> where there is a mix of signal and noise, as in your test examples.  For
>>>> aligning noise you want a fairly sloppy not very precisely discriminating
>>>> comparison that is picking up broad characteristics.  What's great about
>>>> running two match matrices is that the algorithm naturally switches in to
>>>> using the best kind of matching for different sections.
>>>>
>>>>
>>>> On storage requirements, these can be reduced dramatically relative to
>>>> MATCH, even allowing large time shifts, by a divide and conquer approach.
>>>> Instead of allocating space length x max-shift you sample evenly and only
>>>> allocate space of k x max-shift for some small value of k such as 100.  The
>>>> cost is that you have to repeat the analysis log( length-of-sequence) times,
>>>> where log is to the base k.  So aligning to the nearest 10ms on two 1hr
>>>> sequences with a shift of up to 20 mins would take 50Mb storage (if one
>>>> match matrix) or 100Mb (with two in parallel), and the analysis would be
>>>> repeated 3 times.  Because you stay in cache in the analysis and write much
>>>> less to external memory it's a big net win both in storage and speed over a
>>>> single pass approach.
>>>>
>>>> I haven't written versions for sound.  This is extrapolating from back in
>>>> old times, in the late 80's when I was analysing DNA and protein sequences
>>>> on a PC with a fraction of the power and storage of modern PCs.  You had to
>>>> be inventive to get any decent performance at all.  This kind of trick can
>>>> pay off in a big way, even today.
>>>>
>>>> I can spell out in more detail if you might go down the dynamic
>>>> programming route, as I realise I have been a bit abbreviated in my
>>>> description here!
>>>>
>>>> --James.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 7/7/2016 11:00 PM, Raphaël Marinier wrote:
>>>>
>>>> Thanks for the information.
>>>>
>>>> I did some testing of the MATCH vamp plugin, running it via sonic
>>>> analyzer, which integrates it already.
>>>>
>>>> First of all, the algorithm is pretty expensive, and its runtime seems
>>>> linear in the max time shift allowed. For aligning two 1h tracks, with a max
>>>> allowed time shift of 60s, it takes 6 minutes on a recent processor (Intel
>>>> i5-5200U), and takes about 8GB of RAM. Using is for largeer time shifts such
>>>> as 10 minutes will be quite expensive...
>>>>
>>>> I also tested the quality of the results, to the extent sonic-analyzer
>>>> allowed me - it can only report graphical results of the alignment analysis,
>>>> but does not actually align the tracks.
>>>>
>>>> (1)  2 identical audio tracks of a recorded concert, with a time-shift of
>>>> about 15s between them.
>>>> Alignment seems perfect.
>>>>
>>>> (2) 2 identical audio tracks of a recorded concert, except for a 30s hole
>>>> filled with pink noise, with a time-shift of about 15s between them.
>>>> There are 1-2 second zones at the boundaries of the hole where the audio
>>>> is wrongly aligned. This will be quite problematic when building a feature
>>>> that allows mix and matching different versions of each passage.
>>>>
>>>> (3) 2 audio tracks recorded from the same concert (left right channels
>>>> from same device), except for a 30s hole filled with pink noise, with a
>>>> time-shift of about 15s between them.
>>>> Sames issues as (2), no new issues.
>>>>
>>>> (4) 2 audio tracks of the same concert, recorded with 2 different devices.
>>>> Throughout the match, it finds ratios of tempos that are as divergent as
>>>> <0.8 or >1.2 a significant fraction of the time. This is pretty bad since a
>>>> correct match should find a tempo ratio of 1 throughout the recording.
>>>> Things can be improved using non-default parameters of lowering the cost of
>>>> the diagonal to 1.5, and enabling the "path smoothing" feature, but tempo
>>>> ratio still routinely hovers around 0.9 - 1.1.
>>>>
>>>> (5) 2 recordings of two performances of the same composition, time shift
>>>> of about 15s, and hole of about 30s.
>>>> Default parameters lead to big issues at boundaries around the hole (10s
>>>> and 30s of incorrect matches).
>>>> However, using non-default cost for diagonal again significantly improves
>>>> the match by mostly fixing the boundaries around the hole. There is still a
>>>> small issue with the first 0.5s of the performance that remains incorrectly
>>>> matched.
>>>> I cannot really evaluate the match more than that, because sonic-analyzer
>>>> just produces the graphs, but does not actually match the tracks.
>>>>
>>>> My conclusion is that the match plugin cannot be used that easily, even
>>>> for the simple case of 2 recordings of the same event, because of accuracy
>>>> and performance. The former could be fixable by imposing stronger regularity
>>>> of the path (e.g. piecewise linear). The latter might be harder.
>>>>
>>>> I propose to start working on an algorithm and feature specific to the
>>>> case of 2 recordings of the same event, which is an easier case to start
>>>> with both in terms of algorithm and UI.
>>>> I also agree that we won't be able to align perfectly, in particular
>>>> because of stereo. All we can do is best-effort given the sources. I will
>>>> allow for piecewise linear ratios between frequencies (with additional
>>>> regularity restrictions), to account for varying clock drifts.
>>>>
>>>> Cheers,
>>>>
>>>> --
>>>> Raphaël
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jun 27, 2016 at 9:19 AM, Robert Hänggi <[hidden email]>
>>>> wrote:
>>>>> Hi
>>>>> Incidentally, I've just stumbled over a real-life example where this
>>>>> alignment would really be of great use to me.
>>>>> I'm modelling a CD4 demodulation plug-in.
>>>>> For the background see:
>>>>> http://forum.audacityteam.org/viewtopic.php?p=307553#p307553
>>>>> There are also two test (calibration) recordings in this specific post.
>>>>>
>>>>> In essence, four tracks are embedded in a single stereo track.
>>>>> The aim is to reverse-engineer what is in a hardware phono demodulator.
>>>>> I can demodulate the signal, however, there are some difficulties in
>>>>> proper aligning it with the base audio:
>>>>> Base left=LFront + LBack (for normal stereo playback)
>>>>> FM Left= LFront - LBack
>>>>> (ditto for right)
>>>>> Thus, I can't simply align them until they cancel.
>>>>> What's more, the frequencies do not match exactly because we have RIAA
>>>>> in combination with a noise reduction expander, a delay caused by the
>>>>> low/high pass filter etc.
>>>>>
>>>>> In summary, the alignment had to be very exact but at the same time
>>>>> insensitive to noise, phase & amplitude deviations, and on and on...
>>>>> For the moment, I will use cross-correlation and least square fitting
>>>>> for certain "anchor" points.
>>>>> I look forward to seeing the aligning feature someday implemented in
>>>>> Audacity. Good luck.
>>>>>
>>>>> Cheers
>>>>> Robert
>>>>>
>>>>>
>>>>> 2016-06-27 2:38 GMT+02:00, Roger Dannenberg <[hidden email]>:
>>>>>> Excellent point. Also, aligning anything to a stereo track will
>>>>>> generate
>>>>>> similar problems. I would suggest that if you're recording with
>>>>>> multiple
>>>>>> microphones and devices, you're guaranteed to hit phase and multiple
>>>>>> source problems. In the spirit of the "principle of least surprise" I
>>>>>> would expect an alignment effect to just do a reasonable job given the
>>>>>> sources. E.g. if acoustic sources are spread over 10 meters (~30ms at
>>>>>> the speed of sound), I'd hope individual sources would be aligned
>>>>>> within
>>>>>> 30ms. If there were a single source, I'd hope for much better.
>>>>>>
>>>>>> Another possibility is aligning to multiple tracks representing the
>>>>>> same
>>>>>> collection of sound sources recorded from different locations. It's
>>>>>> subtly different from aligning to a single track.
>>>>>>
>>>>>> -Roger
>>>>>>
>>>>>> On 6/26/16 7:01 PM, James Crook wrote:
>>>>>>> Something else to think about is what happens if you attempt to align
>>>>>>> two mono tracks that happen actually to be left and right audio of a
>>>>>>> stereo track.
>>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>>>> traffic
>>>> patterns at an interface-level. Reveals which users, apps, and protocols
>>>> are
>>>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>>>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>>>> planning
>>>> reports.http://sdm.link/zohodev2dev
>>>> _______________________________________________
>>>> audacity-devel mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/audacity-devel
>>>>
>>>
>>> ------------------------------------------------------------------------------
>>> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
>>> patterns at an interface-level. Reveals which users, apps, and protocols are
>>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>>> planning
>>> reports.http://sdm.link/zohodev2dev
>>> _______________________________________________
>>> audacity-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/audacity-devel
>>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> audacity-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/audacity-devel
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> audacity-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/audacity-devel


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
audacity-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/audacity-devel
12
Loading...