clinux-kerneldevice-mapper

create a non-trivial device mapper target


I am trying to write a remapping target for usage with DM.

I followed instructions from several places (including this Answer) all essentially giving the same code.

This is ok, but not enough for me.

I need to modify "in transit" data of struct bio being remapped.

This means I need to make a deep-clone of the bio, including the data; apparently the provided functions (e.g.: bio_clone_bioset()) do not copy data at all, but point iovec's to the original pages/offsets.

I tried some variations of the following scheme:

void
mt_copy(struct bio *dst, struct bio *src) {
    struct bvec_iter src_iter, dst_iter;
    struct bio_vec src_bv, dst_bv;
    void *src_p, *dst_p;
    unsigned bytes;
        unsigned salt;

    src_iter = src->bi_iter;
    dst_iter = dst->bi_iter;
        salt = src_iter.bi_sector;

    while (1) {
        if (!src_iter.bi_size) {
            break;
        }

        if (!dst_iter.bi_size) {
            break;
        }

        src_bv = bio_iter_iovec(src, src_iter);
        dst_bv = bio_iter_iovec(dst, dst_iter);

        bytes = min(src_bv.bv_len, dst_bv.bv_len);

        src_p = kmap_atomic(src_bv.bv_page);
        dst_p = kmap_atomic(dst_bv.bv_page);

        memcpy(dst_p + dst_bv.bv_offset, src_p + src_bv.bv_offset, bytes);

        kunmap_atomic(dst_p);
        kunmap_atomic(src_p);

        bio_advance_iter(src, &src_iter, bytes);
        bio_advance_iter(dst, &dst_iter, bytes);
    }
}

static struct bio *
mt_clone(struct bio *bio) {
        struct bio    *clone;

        clone = bio_clone_bioset(bio, GFP_KERNEL, NULL);
        if (!clone) {
                return NULL;
        }
        if (bio_alloc_pages(clone, GFP_KERNEL)) {
                bio_put(clone);
                return NULL;
        }

        clone->bi_private = bio;

        if (bio_data_dir(bio) == WRITE) {
                mt_copy(clone, bio);
        }

        return clone;
}

static int
mt_map(struct dm_target *ti, struct bio *bio) {
        struct mt_private *mdt = (struct mt_private *) ti->private;

        bio->bi_bdev = mdt->dev->bdev;

        bio = mt_clone(bio);
        submit_bio(bio->bi_rw, bio);

        return DM_MAPIO_SUBMITTED;
}

This, however, does not work.

When I submit_bio() using the cloned bio I do not get the .end_io call and the calling task becomes blocked ("INFO: task mount:488 blocked for more than 120 seconds."). This with a READ request consisting of a single iovec (1024 bytes). In this case, of course the in buffers do not need copying because they should be overwritten; I need to copy back the incoming data unto the original buffers after the request has completed... but I don't get there.

I'm quite evidently missing some piece, but I'm unable to understand what.

Note: I didn't do any optimization (e.g.: use smarter allocation strategies) specifically because I need to get the basics first.

Note: I corrected a mistake (thanks @RuslanRLaishev), unfortunately ininfluent; see my own answer.


Solution

  • It turns out bio_clone_bioset() and friends do not copy the callback address to call when request is over.

    Trivial solution is to add clone->bi_end_io = bio->bi_end_io; before the end of mt_clone().

    Unfortunately this is not enough to make the code functional because it turns out upper layers can spawn thousands of inflight requests (i.e.: requests queued and preprocessed before the previous ones complete) leading to memory starvation. Trying to slow upper layers by returning DM_MAPIO_REQUEUE does not seem to work (see: https://unix.stackexchange.com/q/410525/130498). This has nothing to do with current question, however.