[SOLVED] Proper use of `nalu_process` callback in x264

Proper use of `nalu_process` callback in x264

I wish to make use of libx264's low-latency encoding mechanism, whereby a user-provided callback is called as soon as a single NAL unit is available instead of having to wait for a whole frame to be encoded before starting processing.

The x264 documentation states the following about that facility:

/* Optional low-level callback for low-latency encoding.  Called for each output NAL unit
 * immediately after the NAL unit is finished encoding.  This allows the calling application
 * to begin processing video data (e.g. by sending packets over a network) before the frame
 * is done encoding.
 *
 * This callback MUST do the following in order to work correctly:
 * 1) Have available an output buffer of at least size nal->i_payload*3/2 + 5 + 64.
 * 2) Call x264_nal_encode( h, dst, nal ), where dst is the output buffer.
 * After these steps, the content of nal is valid and can be used in the same way as if
 * the NAL unit were output by x264_encoder_encode.
 *
 * This does not need to be synchronous with the encoding process: the data pointed to
 * by nal (both before and after x264_nal_encode) will remain valid until the next
 * x264_encoder_encode call.  The callback must be re-entrant.
 *
 * This callback does not work with frame-based threads; threads must be disabled
 * or sliced-threads enabled.  This callback also does not work as one would expect
 * with HRD -- since the buffering period SEI cannot be calculated until the frame
 * is finished encoding, it will not be sent via this callback.
 *
 * Note also that the NALs are not necessarily returned in order when sliced threads is
 * enabled.  Accordingly, the variable i_first_mb and i_last_mb are available in
 * x264_nal_t to help the calling application reorder the slices if necessary.
 *
 * When this callback is enabled, x264_encoder_encode does not return valid NALs;
 * the calling application is expected to acquire all output NALs through the callback.
 *
 * It is generally sensible to combine this callback with a use of slice-max-mbs or
 * slice-max-size.
 *
 * The opaque pointer is the opaque pointer from the input frame associated with this
 * NAL unit. This helps distinguish between nalu_process calls from different sources,
 * e.g. if doing multiple encodes in one process.
 */
void (*nalu_process)( x264_t *h, x264_nal_t *nal, void *opaque );

This seems straight forward enough. However, when I run the following dummy code, I get a segfault on the marked line. I've tried to add some debugging to x264_nal_encode itself to understand where it goes wrong, but it seems to be the function call itself that results in a segfault. Am I missing something here? (Let's ignore the fact that the use of assert probably makes cb non-reentrant – it's only there to indicate to the reader that my workspace buffer is more than large enough.)

#include <assert.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <x264.h>

#define WS_SIZE 10000000
uint8_t * workspace;

void cb(x264_t * h, x264_nal_t * nal, void * opaque)
{
  assert((nal->i_payload*3)/2 + 5 + 64 < WS_SIZE);
  x264_nal_encode(h, workspace, nal); // Segfault here.
  // Removed: Process nal.
}

int main(int argc, char ** argv)
{
  uint8_t * fake_frame = malloc(1280*720*3);
  memset(fake_frame, 0, 1280*720*3);

  workspace = malloc(WS_SIZE);

  x264_param_t param;
  int status = x264_param_default_preset(&param, "ultrafast", "zerolatency");
  assert(status == 0);

  param.i_csp = X264_CSP_RGB;
  param.i_width = 1280;
  param.i_height = 720;
  param.i_threads = 1;
  param.i_lookahead_threads = 1;
  param.i_frame_total = 0;
  param.i_fps_num = 30;
  param.i_fps_den = 1;
  param.i_slice_max_size = 1024;
  param.b_annexb = 1;
  param.nalu_process = cb;

  status = x264_param_apply_profile(&param, "high444");
  assert(status == 0);

  x264_t * h = x264_encoder_open(&param);
  assert(h);

  x264_picture_t pic;
  status = x264_picture_alloc(&pic, param.i_csp, param.i_width, param.i_height);
  assert(pic.img.i_plane == 1);

  x264_picture_t pic_out;
  x264_nal_t * nal; // Not used. We process NALs in cb.
  int i_nal;

  for (int i = 0; i < 100; ++i)
  {
    pic.i_pts = i;
    pic.img.plane[0] = fake_frame;
    status = x264_encoder_encode(h, &nal, &i_nal, &pic, &pic_out);
  }

  x264_encoder_close(h);
  x264_picture_clean(&pic);
  free(workspace);
  free(fake_frame);
  return 0;
}

Edit: The segfault happens the first time cb calls x264_nal_encode. If I switch to a different preset, where more frames are encoded before the first callback happens, then several successful calls to x264_encoder_encode are made before the first callback, and hence segfault, occurs.

Solution

After discussions with x264 developers on IRC, it seems that the behavior I was seeing is, in fact, a bug in x264. The x264_t * h passed to the callback is incorrect. If one overrides that handle with the good one (the one obtained from x264_encoder_open), things work fine.

I identified x264 git commit 71ed44c7312438fac7c5c5301e45522e57127db4 as the first bad one. The bug is documented as this x264 issue.

Update for future readers: I believe this issue has been fixed in commit 544c61f082194728d0391fb280a6e138ba320a96.