php d3.js bioinformatics vcf-variant-call-format bam

How to convert the large bio-format file to database-like file that can be asynchronously accessed width JavaScript

Pros!

I have a visualization project that render the biological data to canvas charts, in which I use a javascritp framwork called jgv.js(the doc API) to generate canvas.

Here’s a simple config demo:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>IGV Data Vis</title>
    <link rel="stylesheet" href="source/jquery-ui.css">
    <link rel="stylesheet" href="source/font-awesome.min.css">
    <link rel="stylesheet" href="source/igv-1.0.1.css">
    <script src="source/jquery.min.js"></script>
    <script src="source/jquery-ui.min.js"></script>
    <script src="source/igv-1.0.1.js"></script>
</head>
<body>
    <div id="container"></div>

    <script>
        let options = {
                palette: ["#00A0B0", "#6A4A3C", "#CC333F", "#EB6841"],
                locus: "7:55,085,725-55,276,031",

                reference: {
                    id: "hg19",
                    fastaURL: "//igv.broadinstitute.org/genomes/seq/1kg_v37/human_g1k_v37_decoy.fasta",
                    cytobandURL: "//igv.broadinstitute.org/genomes/seq/b37/b37_cytoband.txt"
                },

                trackDefaults: {
                    bam: {
                        coverageThreshold: 0.2,
                        coverageQualityWeight: true
                    }
                },

                tracks: [
                    {
                        name: "Genes",
                        url: "//igv.broadinstitute.org/annotations/hg19/genes/gencode.v18.collapsed.bed",
                        index: "//igv.broadinstitute.org/annotations/hg19/genes/gencode.v18.collapsed.bed.idx",
                        displayMode: "EXPANDED",
                        height: 350,
                        color: '#ff0000'
                    }
                ]
            };

        let browser = igv.createBrowser(document.getElementById('container'), options);
    </script>
</body>
</html>

The items of tracks in the code above are bio-information statments that could be in the form of plain-text file or binary file(*.bam).

The problem is the bio files are so terible large that I can not access them directly, no mention for the clients. Such as:

.bam approximate 3G
.vcf approximate 1G

So, is there any back-end solutions to make those files accessable piece by piece? Just like the way of AJAX.

Any suggestions will be appreciated!

Solution

Depends of what you mean by 'piece by piece'.

Bam and vcf files use a bgzip format that can be accessed using random access. Even through the web has long as the hosting server supports the 'Byte-Range:' request.

$ tabix "http://igv.broadinstitute.org/annotations/hg19/genes/gencode.v18.collapsed.bed.gz" "1:40723778-40759856"

1   40723778    40759856    ZMPSTE24    1000.0  +   40723778    40759856    .   17  288,159,156,183,147,72,87,51,117,153,142,185,105,353,144,1740,177,  0,129,132,1243,2732,4727,9679,9679,10312,11868,13787,23236,27818,32538,32747,34338,34338,
1   40728343    40728656    RP1-39G22.4 1000.0  -   40728343    40728656    .   1   313,    0,

For bioinformatics, you can also ask biostars.org