Help - Search - Members - Calendar
Full Version: Development of generic sample synchronisation tool
Hydrogenaudio Forums > Hydrogenaudio Forum > Listening Tests
rpp3po
After C.R. Helmrich's findings, that Foobar's ABX comparator isn't necessarily time synched for some material, I have started to develop a generic, cross platform sample synchronization tool/lib, that can be used for exact time synching prior to ABX testing. It is supposed not to depend on gapless tag information, but mimic the manual steps Helmrich did in Audition.

The program is finished so far, but there is a problem. The jUnit test cases work fine. They process pairs of WAV files, each one with a 1000 sample offset and one without, and check for equality after removing the preceding samples, to make sure the relevant parts aren't touched.

The algorithm is still pretty simple and works like this: Count the original WAV's leading empty samples. That's the part of the prefix, that should not be cut. Count the target's leading empty samples. Now subtract the first count from the second to get the target's delay. Then strip the delay from the target.

The problem is now, tested with real world samples the frames supposed to contain the encoder delay aren't empty! WTF? So the algorithm doesn't work in its simple form. Any ideas? Just stripping the difference between total samples from the beginning of the target won't work either. Some encoders also add samples at the end of the file.

PS Do common lossy codecs preserve transients with frame-exact accuracy? I just thought about measuring the distance from frame 0 to the steepest transient (largest difference between two consecutive frames) in both files and then subtract the difference from the target. Does anybody see possible pitfalls?
rpp3po
I have followed up on the transient idea, which has expanded the code base several times over the first try. The general strategy is to locate distinct positions in both samples and then use some heuristics to synchronize both.

In a first iteration I have taken transient steepness as distinct attribute. The algorithm compiles a list of the 100 biggest differences between pairs of consecutive samples. This data is then used in a second step to calculate an offset.

Again it works on test samples, but not real world data. When comparing the tec_sqam_40a_bwf_tcm6-12548.wav sample vs. its LAME encoding it turns out that LAME completely flattens all transients! Or at least it spreads them over several frames.

These are the results:

CODE

Original's transients:
Delta: -8464 Pos: 58130
Delta: -8484 Pos: 118488
Delta: -8514 Pos: 118470
Delta: 8519 Pos: 58102
Delta: -8650 Pos: 58100
Delta: -8662 Pos: 135541
Delta: 8806 Pos: 118466
Delta: 8977 Pos: 118433
Delta: 9136 Pos: 173506
Delta: 9336 Pos: 118471
Delta: 9343 Pos: 58129
Delta: -9686 Pos: 118434
Delta: 9899 Pos: 58081
Delta: 9978 Pos: 58111
Delta: -10326 Pos: 118483
Delta: -10337 Pos: 135534
Delta: -10370 Pos: 58082
Delta: 10405 Pos: 58078
Delta: 11096 Pos: 118482
Delta: 11745 Pos: 58090

Target's transients:
Delta: 1975 Pos: 343414
Delta: 1978 Pos: 343539
Delta: 1978 Pos: 345801
Delta: 1979 Pos: 324576
Delta: 1981 Pos: 336255
Delta: 1982 Pos: 325832
Delta: 1984 Pos: 343916
Delta: 1990 Pos: 333744
Delta: 1994 Pos: 332739
Delta: 1997 Pos: 331358
Delta: 1997 Pos: 333870
Delta: 1997 Pos: 343288
Delta: 2000 Pos: 331734
Delta: 2004 Pos: 327339
Delta: 2006 Pos: 332614
Delta: 2016 Pos: 331735
Delta: 2017 Pos: 325706
Delta: 2019 Pos: 332488
Delta: 2034 Pos: 345424
Delta: 2035 Pos: 331609



This is getting a little tiring all by myself. Please share, if you have any ideas about how it can be improved.

This is the code of the fingerprinting class:

CODE
import java.util.SortedSet;
import java.util.TreeSet;

/**
*
* @author rpp3po
*/
public class Fingerprinter {

    private final SortedSet<TransientPos> originalTransients;
    private final SortedSet<TransientPos> targetTransients;
    private int lastOriginalSample = 0;
    private int lastTargetSample = 0;
    private static final int transientCount = 100;

    public Fingerprinter() {
        originalTransients = new TreeSet<TransientPos>();
        targetTransients = new TreeSet<TransientPos>();
        init(originalTransients);
        init(targetTransients);
    }

    public void addOriginalSample(int sample, int position) {
        TransientPos tp = new TransientPos(sample - lastOriginalSample, position);
        addDelta(originalTransients, tp);
        lastOriginalSample = sample;
    }

    public void addTargetSample(int sample, int position) {
        TransientPos tp = new TransientPos(sample - lastTargetSample, position);
        addDelta(targetTransients, tp);
        lastTargetSample = sample;
    }

    private static void addDelta(SortedSet<TransientPos> set, TransientPos tp) {
        TransientPos first = set.first();
        if (tp.compareTo(first) > 0) {
            set.remove(first);
            set.add(tp);
        }
    }

    public void dumpTransients() {
        System.out.println("Original's transients:");
        for (TransientPos tp : originalTransients) {
            System.out.println("Delta: " + tp.delta + "\t" + "Pos: " + tp.position);
        }
        System.out.println("Target's transients:");
        for (TransientPos tp : targetTransients) {
            System.out.println("Delta: " + tp.delta + "\t" + "Pos: " + tp.position);
        }
    }

    private static void init(SortedSet set) {
        for (int i=0; i<transientCount; i++) {
            set.add(new TransientPos(i, i));
        }
    }
    
    static class TransientPos implements Comparable {
        
        private int delta, position;

        public TransientPos(int delta, int position) {
            this.delta = delta;
            this.position = position;
        }

        public int compareTo(Object o) {
            TransientPos other = (TransientPos) o;
            int i = Math.abs(this.delta) - Math.abs(other.delta);
            return i != 0 ? i : this.position - other.position;
        }        
    }
}


Two WAV files are fed sample by sample into the add..Sample methods. After both files have been read, the Fingerprinter instance contains two sets of transient positions for further analysis.
menno
Crosscorrelation works pretty well for this.

BTW: There already is a pretty good tool for doing this: http://www-mmsp.ece.mcgill.ca/Documents/So.../CompAudio.html
You can download it from here: http://www-mmsp.ece.mcgill.ca/Documents/Downloads/AFsp/
rpp3po
Thanks for the reference! That's indeed a very interesting toolset, that I didn't know until today.

I just can't see, yet, how this could help for file synching. As far as I can see it doesn't provide this capability.

PS I see now that it provides a 'differential SNR' feature. I could think of using such to adjust the target's offset until the differential SNR reaches the lowest value. That would work, but it would be quite ressource intensive. Any other ideas, anybody? Or is there maybe already a Java lib to cross correlate sequences of sample values?
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.