After starting bioinformatics workflow in Coretex, I am getting the following message even though data seems to be in order:
"Failed to determine which column contains sampleIDs/names..
." and then the list of available names, but I am using one from the list.
I am trying to run a microbiome sequencing task in Coretex, and I have used standard microbiome sequencing data in .fastq.gz
format. Run should have been successful but it is failing every time.
I've worked with this R code for uploading metadata:
loadMetadata <- function(metadataSample) {
metadata_csv_path <- builtins$str(
metadataSample$joinPath("metadata.csv")
)
if (file.exists(metadata_csv_path)) {
# Default SampleSheet.csv format
metadata <- read.table(
metadata_csv_path,
sep = ",",
header = TRUE,
check.names = TRUE
)
} else {
# Format accepted by qiime2
metadata_tsv_path <- builtins$str(
metadataSample$joinPath("metadata.tsv")
)
if (!file.exists(metadata_tsv_path)) {
stop("Metadata file not found")
}
metadata <- read.table(
metadata_tsv_path,
sep = "\t",
header = TRUE,
check.names = TRUE
)
# qiime has 1 extra row after header which contains types
metadata <- metadata[-1,]
}
# Remove leading and trailing whitespace
colnames(metadata) <- lapply(colnames(metadata), trimws)
stringColumns <- names(metadata)[vapply(metadata, is.character, logical(1))]
metadata[, stringColumns] <- lapply(metadata[, stringColumns], trimws)
sampleIdColumn <- getSampleIdColumnName(metadata)
print(paste("Matched metadata sample ID/name column to", sampleIdColumn))
print("Renaming metadata sample ID/name column to \"sampleId\"")
names(metadata)[names(metadata) == sampleIdColumn] <- "sampleId"
print("Metadata")
print(colnames(metadata))
print(head(metadata))
print(metadata$sampleId)
# assign the names of samples (01Sat1...) to metadata rows instead of 1,2,3...
row.names(metadata) <- metadata$sampleId
metadata$sampleId <- as.factor(metadata$sampleId)
return(metadata)
}
Judging by the logs of your Coretex Workflow it looks like your Dataset contains metadata.csv
file which uses ;
as a separator, but the Coretex Task for loading BioInformatics data tries to load it with a ,
as a separator. This was changed in the latest version of the Task and you can see the full changelog here.
Instead of always forcing the separator to be ,
(old version):
# Default SampleSheet.csv format
metadata <- read.table(
metadata_csv_path,
sep = ",",
header = TRUE,
check.names = TRUE
)
It will now try to automatically determine what the separator is using fread
function (new version):
metadata <- fread(metadata_csv_path, data.table=FALSE)