I'm currently investigating dejavu.py (some more info), and I must say that I am quite impressed by it so far. Though I do find that the docs are a bit incomplete when it comes to user interfacing.
When you recognise a song from file with oDjv.recognize(FileRecognizer, sFile)
, you get returned a dictionary which looks like this:
{'song_id': 2, 'song_name': 'Sean-Fournier--Falling-For-You', 'file_sha1': 'A9D18B9B9DAA467350D1B6B249C36759282B962E', 'confidence': 127475, 'offset_seconds': 0.0, 'match_time': 32.23410487174988, 'offset': 0}
And from recording (oDjv.recognize(MicrophoneRecognizer, seconds=iSecs)
):
{'song_id': 2, 'song_name': 'Sean-Fournier--Falling-For-You', 'file_sha1': 'A9D18B9B9DAA467350D1B6B249C36759282B962E', 'confidence': 124, 'offset_seconds': 24.89179, 'offset': 536}
So, to the questions:
1) What exactly is confidence
, and is there an upper bounds for the confidence level?
2) What is the difference between offset_seconds
and offset
?
3) Why does it take the algorithm somewhere between 30 and 60 seconds (in the case of all tests I ran) to identify the song from disk, but it can do it in 10 or so seconds when recording audio?
4) When running the function to record from audio, I get the following chunk of code preceding the actual output (even if successful) from the function. Where are we trying to go?
ALSA lib pcm_dmix.c:1022:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
bt_audio_service_open: connect() failed: Connection refused (111)
bt_audio_service_open: connect() failed: Connection refused (111)
bt_audio_service_open: connect() failed: Connection refused (111)
bt_audio_service_open: connect() failed: Connection refused (111)
ALSA lib pcm_dmix.c:1022:(snd_pcm_dmix_open) unable to open slave
Cannot connect to server socket err = No such file or directory
Cannot connect to server request channel
jack server is not running or cannot be started
5) Is there an online music Database that I can just plug into the config?
dConfig = {
"database": {
"host": "some magical music database",
"user": "root",
"passwd": "",
"db": "dejavu"
}
}
oDjv = Dejavu(dConfig)
Most of your questions can either be found in the Dejavu github README.md or by the writeup and explanation here.
But to answer each of your numbered questions:
confidence
is the number of fingerprint hashes that "aligned" in the current audio clip to the database closest match. There's no probabilistic interpretation. Keep in mind there can be many thousands of fingerprints per audio file, so have that as a reference point. offset_seconds
is expressed as seconds, and offset
expressed as the length of the algorithm's time bins. python dejavu.py --recognize mic 5
which listens for 5 seconds instead of the default of 10. FYI, one of the best options of the library is you can also change the number of seconds Dejavu uses for on-disk recognition in the JSON config file with the fingerprint_limit
key. pyaudio
. In your case see this solution, perhaps it might help.DEFAULT_FAN_VALUE
. Need higher collision guarantees but don't mind the extra storage cost? You can decrease the FINGERPRINT_REDUCTION
and keep more characters of each SHA-1. Dejavu is meant to adapt to many different use cases which necessarily means that if you change fingerprinting parameters in this file your database will have differently distribution and structure.