I am attempting to convert 16 bit audio into 12 bit audio. However, I am quite inexperienced with such conversions and believe my approach is possibly incorrect or flawed.
The use case, as context for the code snippets below, is an Android app which the user can speak into and that audio is transmitted to an IoT device for immediate playback. The IoT device expects audio in mono 12 bit, 8k sample rate, little endian, unsigned, with the data stored in the first twelve bits (0-11) and final four bits (12-15) are zeroes. Audio data needs to be received in packets of 1000 bytes.
The audio is being created in the Android app through the use of AudioRecord. The instantiation of which is as follows:
int bufferSize = 1000;
this.audioRecord = new AudioRecord(
MediaRecorder.AudioSource.MIC,
8000,
AudioFormat.CHANNEL_IN_MONO,
AudioFormat.ENCODING_PCM_16BIT,
bufferSize
);
In a while loop, the AudioRecord is being read from by 1000 byte packets and modified to the specifications in the use case. Not sure this part is relevant, but for completeness:
byte[] buffer = new byte[1000];
audioRecord.read(buffer, 0, buffer.length);
byte[] modifiedBytes = convert16BitTo12Bit(buffer);
Then the modifiedBytes are sent off to the device.
Here are the methods which modify the bytes. Basically, to conform to the specifications, I am shifting the bits in each 16 bit set (tossing the least significant 4) and adding zeroes to the final four spots. I do this through BitSet.
/**
* Takes a byte array presented as 16 bit audio and converts it to 12 bit audio through bit
* manipulation. Packets must be of 1000 bytes or no manipulation will occur and the input
* will be immediately returned.
*/
private byte[] convert16BitTo12Bit(byte[] input) {
if (input.length == 1000) {
for (int i = 0; i < input.length; i += 2) {
Log.d(TAG, "convert16BitTo12Bit: pass #" + (i / 2));
byte[] chunk = new byte[2];
System.arraycopy(input, i, chunk, 0, 2);
if (!isEmptyByteArray(chunk)) {
byte[] modifiedBytes = convertChunk(chunk);
System.arraycopy(
modifiedBytes,
0,
input,
i,
modifiedBytes.length
);
}
}
return input;
}
Log.d(TAG, "convert16BitTo12Bit: Failed - input is not 1000 in length; it is " + input.length);
return input;
}
/**
* Converts 2 bytes 16 bit audio into 12 bit audio. If the input is not 2 bytes, the input
* will be returned without manipulation.
*/
private byte[] convertChunk(byte[] chunk) {
if (chunk.length == 2) {
BitSet bitSet = BitSet.valueOf(chunk);
Log.d(TAG, "convertChunk: bitSet starts as " + bitSet.toString());
modifyBitSet(bitSet);
Log.d(TAG, "convertChunk: bitSet ends as " + bitSet.toString());
return bitSet.toByteArray();
}
Log.d(TAG, "convertChunk: Failed = chunk is not 2 in length; it is " + chunk.length);
return chunk;
}
/**
* Removes the first four bits and shifts the rest to leave the final four bits as 0.
*/
private void modifyBitSet(BitSet bitSet) {
for (int i = 4; i < bitSet.length(); i++) {
bitSet.set(i - 4, bitSet.get(i));
}
if (bitSet.length() > 8) {
bitSet.clear(12, 16);
} else {
bitSet.clear(4, 8);
}
}
/**
* Returns true if the byte array input contains all zero bits.
*/
private boolean isEmptyByteArray(byte[] input) {
BitSet bitSet = BitSet.valueOf(input);
return bitSet.isEmpty();
}
Unfortunately, this approach produces subpar results. The audio is quite noisy and it is difficult to make out what someone is saying (but you can hear that words are being spoken).
I also have been playing around with just saving the bytes to a file and playing it back on Android through AudioTrack. I noticed that if I just remove the first four bits and do not shift anything, the audio actually sounds good, as such:
private void modifyBitSet(BitSet bitSet) {
bitSet.clear(0, 4);
}
However, when played through the device, it sounds even worse, and I don't even think I can make out any words.
Clearly, my approach is not working here. Central question is how would one convert a 16 bit chunk into 12 bit audio and maintain audio quality given the requirement that the final four bits must be zero? Additionally, given my larger approach of using AudioRecord to obtain the audio, would such a solution for the prior question fit this use case?
Please let me know if there is anything more I can provide to clarify these questions and my intent.
I have discovered a solution that produces clear audio. First, it is important to recount the requirements for the use case, which is 12 bit unsigned mono audio which will be read in little endian by the device in packets of 1000 bytes.
The initialization and configuration of the AudioRecord as described in the question is fine.
Once the 1000 bytes of audio is read from AudioRecord, it can be put into a ByteBuffer and defined as little endian for modification, and then put into a ShortBuffer to do manipulation on the 16 bit level:
// Audio specifications of device is in little endian.
ByteBuffer byteBuffer = ByteBuffer.wrap(input).order(ByteOrder.LITTLE_ENDIAN);
// Turn into a ShortBuffer so bitwise manipulation can occur on the 16 bit level.
ShortBuffer shortBuffer = byteBuffer.asShortBuffer();
Next, in a loop, take each short and modify it to 12 bit unsigned:
for (int i = 0; i < shortBuffer.capacity(); i++) {
short currentShort = shortBuffer.get(i);
shortBuffer.put(i, convertShortTo12Bit(currentShort));
}
This can be accomplished by shifting the 16 bits four spaces to the right to turn it into 12 bit signed. Then, to convert to unsigned, add 2048. For our purposes as a safety step, we also mask the least significant four bits as required by device, but given the shifting and adding, it shouldn't be the case that any bits actually remain there:
private static short convertShortTo12Bit(short input) {
int inputAsInt = input;
inputAsInt >>>= 4;
inputAsInt += 2048;
input = (short) (inputAsInt & 0B0000111111111111);
return input;
}
If one wishes to return 12 bits to 16 bits, do the reverse for each short (subtract 2048 and shift four spaces to the left).