Is it possible to somehow recognize the original voice or the cloned one in mp3/wav? ( https://www.youtube.com/watch?v=Kfr_FZof_hs ) I need to authenticate the file. Maybe in the cloned voice there are: any "marks" or some special frequencies on the audio track?
I looked at the audio track in Vegas and Audacity, did not find the difference.
I come a bit late for your question. You’ve probably found the answer somewhere else. But, for the records:
Yes, there is a way to detect an audio deepfake made with tortoise-tts. James Betker (https://github.com/neonbjb/tortoise-tts) provides a python script that allows you to do that with a very high confidence (to my own experience with it).
In order to do so:
Install tortoise-tts. Go to: https://github.com/neonbjb/tortoise-tts and follow the instructions therein.
In the “tortoise-tts/tortoise” folder there is a script named : is_this_from_tortoise.py About which J. Betker says: “Out of concerns that this model might be misused, I've built a classifier that tells the likelihood that an audio clip came from Tortoise. […] This model has 100% accuracy on the contents of the results/ and voices/ folders in this repo. Still, treat this classifier as a "strong signal". Classifiers can be fooled and it is likewise not impossible for this classifier to exhibit false positives.”
You run it with:
python is_this_from_tortoise.py –clip=your_audio.wav
(I think only wav and mp3 formats are accepted)
On a highly realistic audio deepfake made with tortoise, it gave me: “This classifier thinks there is a 99.99896240234375% chance that this clip was generated from Tortoise.”
Whereas on a legit audio file, the result was: “This classifier thinks there is a 5.9567817515926436e-05% chance that this clip was generated from Tortoise.”
But as J. Betker says: " Classifiers can be fooled and it is likewise not impossible for this classifier to exhibit false positives.”