Consider I have 2 audio files of any format say A and B. A contains a speech by a person X and 10 minutes long. B contains a small clip of A but spoken by another person Y. Now I need to compare the audios A and B and tell whether A contains the words in or parts in B or B is related to A somehow. Please help me how to do this or any guidance to any algorithms which helps me to do this. No matter whether it is hard to implement. Problems with this case are 1) A and B can be in different file formats. 2) A can contain noises, music etc but B is pure without any noises sometimes can be computer generated voice.
I also need to know whether there is any format in which these files can be compared easily. I heard Fast Fourier Transformation can be used to compare but not sure about the implementation techniques I just came across that in Google.