I have downloaded data from archs rnaseq data. The human hdf5 file (28G). I want to access the expression data and group information. I am using the below code:
h5_exprs <- h5read("archs4_gene_human_v2.1.2.h5", "data/expression")
It throws
Error (scratch_11.R#9): Error in h5checktype(). H5Identifier not valid.
What should I do as extra step to solve the issue?
When I run h5ls("archs4_gene_human_v2.1.2.h5"), the output looks like this:
group name otype dclass dim
0 / data H5I_GROUP
1 /data expression H5I_DATASET INTEGER 620825 x 62548
2 / meta H5I_GROUP
3 /meta genes H5I_GROUP
4 /meta/genes gene_symbol H5I_DATASET STRING 62548
5 /meta samples H5I_GROUP
6 /meta/samples aligned_reads H5I_DATASET INTEGER 620825
7 /meta/samples channel_count H5I_DATASET STRING 620825
8 /meta/samples characteristics_ch1 H5I_DATASET STRING 620825
9 /meta/samples contact_address H5I_DATASET STRING 620825
10 /meta/samples contact_city H5I_DATASET STRING 620825
11 /meta/samples contact_country H5I_DATASET STRING 620825
12 /meta/samples contact_institute H5I_DATASET STRING 620825
13 /meta/samples contact_name H5I_DATASET STRING 620825
14 /meta/samples contact_zip H5I_DATASET STRING 620825
15 /meta/samples data_processing H5I_DATASET STRING 620825
16 /meta/samples extract_protocol_ch1 H5I_DATASET STRING 620825
17 /meta/samples geo_accession H5I_DATASET STRING 620825
18 /meta/samples instrument_model H5I_DATASET STRING 620825
19 /meta/samples last_update_date H5I_DATASET STRING 620825
20 /meta/samples library_selection H5I_DATASET STRING 620825
21 /meta/samples library_source H5I_DATASET STRING 620825
22 /meta/samples library_strategy H5I_DATASET STRING 620825
23 /meta/samples molecule_ch1 H5I_DATASET STRING 620825
24 /meta/samples organism_ch1 H5I_DATASET STRING 620825
25 /meta/samples platform_id H5I_DATASET STRING 620825
26 /meta/samples relation H5I_DATASET STRING 620825
27 /meta/samples series_id H5I_DATASET STRING 620825
28 /meta/samples singlecellprobability H5I_DATASET FLOAT 620825
29 /meta/samples source_name_ch1 H5I_DATASET STRING 620825
30 /meta/samples sra_id H5I_DATASET STRING 620825
31 /meta/samples status H5I_DATASET STRING 620825
32 /meta/samples submission_date H5I_DATASET STRING 620825
33 /meta/samples taxid_ch1 H5I_DATASET STRING 620825
34 /meta/samples title H5I_DATASET STRING 620825
35 /meta/samples type H5I_DATASET STRING 620825
I'm not sure of the cause of this error. I haven't downloaded the whole 28GB file, but if I'm able to read subsets of the /data/expression
dataset directly from the S3 storage e.g.
library(rhdf5)
h5file <- 'https://s3.dev.maayanlab.cloud/archs4/archs4_gene_human_v2.1.2.h5'
h5read(file = h5file,
name = "/data/expression",
index = list(1:10, 1:12),
s3 = TRUE)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
#> [1,] 353 1110 3 0 0 51 0 2 0 0 467 0
#> [2,] 342 873 2 0 1 33 1 5 0 0 388 0
#> [3,] 358 1171 1 0 0 41 0 5 0 0 391 0
#> [4,] 393 849 1 0 0 40 0 0 0 0 148 0
#> [5,] 427 821 0 0 0 30 0 0 0 0 112 0
#> [6,] 293 613 1 0 0 22 3 3 0 0 112 0
#> [7,] 0 0 0 1 0 0 0 0 0 0 0 0
#> [8,] 0 0 0 3 0 0 0 0 0 0 0 0
#> [9,] 1 0 0 5 0 0 0 0 0 0 0 0
#> [10,] 0 0 0 3 0 0 0 0 0 0 0 0
A few thoughts:
h5read()
command you've indicated is really what's found on line 9 of scratch_11.R
?h5errorHandling(type = "verbose")
before running h5read()
, which will give a larger HDF5 error stack and might help narrow down the issue.unable to allocate vector of size ...
error if that was the issue.