voice-recognitionvoice

Is there any way to recognize Tortoise-TTS (voice clone file)?


Is it possible to somehow recognize the original voice or the cloned one in mp3/wav? ( https://www.youtube.com/watch?v=Kfr_FZof_hs ) I need to authenticate the file. Maybe in the cloned voice there are: any "marks" or some special frequencies on the audio track?

I looked at the audio track in Vegas and Audacity, did not find the difference.


Solution

  • I come a bit late for your question. You’ve probably found the answer somewhere else. But, for the records:

    Yes, there is a way to detect an audio deepfake made with tortoise-tts. James Betker (https://github.com/neonbjb/tortoise-tts) provides a python script that allows you to do that with a very high confidence (to my own experience with it).

    In order to do so:

    1. Install tortoise-tts. Go to: https://github.com/neonbjb/tortoise-tts and follow the instructions therein.

    2. In the “tortoise-tts/tortoise” folder there is a script named : is_this_from_tortoise.py About which J. Betker says: “Out of concerns that this model might be misused, I've built a classifier that tells the likelihood that an audio clip came from Tortoise. […] This model has 100% accuracy on the contents of the results/ and voices/ folders in this repo. Still, treat this classifier as a "strong signal". Classifiers can be fooled and it is likewise not impossible for this classifier to exhibit false positives.”

    You run it with:

    python is_this_from_tortoise.py –clip=your_audio.wav 
    

    (I think only wav and mp3 formats are accepted)

    On a highly realistic audio deepfake made with tortoise, it gave me: “This classifier thinks there is a 99.99896240234375% chance that this clip was generated from Tortoise.”

    Whereas on a legit audio file, the result was: “This classifier thinks there is a 5.9567817515926436e-05% chance that this clip was generated from Tortoise.”

    But as J. Betker says: " Classifiers can be fooled and it is likewise not impossible for this classifier to exhibit false positives.”