I am new to both R and asking questions in related forums, so please bear with me.
I have raw data, output from a geochemical analysis, which contains a large number of observations for over 200 variables. The output of this analysis generates a number of columns containing many types of data.
Of concern for this question are two specific column formats: the analysis outputs two separate columns (variables) for a reading of each element's abundance within a sample. The first is a column containing the magnitude, in PPM, of the abundance of the element (variable) for that sample (observation). The second column is the error in measurement for the abundance of the element. So, for example, it will essentially produce the following (simplified) format:
SampleID | Magnesium Abundance [ppm] | Magnesium error [ppm] |
---|---|---|
A1 | 10530 | 300 |
I have to produce a formal report which I am using the "xlsx" package for. In this report, I have to merge the two columns which report the abundance and the corresponding error of an element as output by the analysis from the format of two separate columns to one column containing the "abundance ± error", like so:
Sample ID | Magnesium Abundance [ppm] |
---|---|
A1 | 10035 ± 250 |
My question is: is what I am attempting to do actually plausible? Is there a way to merge columns to produce a single column containing the "value" and "error" as "value ± error"?
I do not have any reproducible code for this as I am at a complete loss for this part.
Here is the entire code for my project thus far. I am sorry if it is hard to read or see where the problem is as I do not have reproducible code for the current issue I am working on, but I do have this.
In the section "# Import and process data", you can see where I upload my raw data, remove columns of all NA's, and produce a separate table for standardized samples (which check for machine analysis drift). I need to now merge the columns as mentioned above for all elements being measured.
library(xlsx)
library(tidyverse)
library(readxl)
### Create Excel Workbook
# Workbook title and subtitle style
wb <- createWorkbook(type = "xlsx")
wb_title_style <- CellStyle(wb) + Font(wb, color = "blue",
heightInPoints = 14,
name = "Times New Roman",
isBold = T,
underline = 0)
wb_subtitle_style <- CellStyle(wb) + Font(wb, color = "black",
heightInPoints = 12,
name = "Times New Roman",
underline = 0)
# Workbook row and column name styles
wb_rownames_style <- CellStyle(wb) + Font(wb, heightInPoints = 12,
isBold = F,
name = "Times New Roman")
wb_colnames_style <- CellStyle(wb) + Font(wb, heightInPoints = 12,
isBold = T,
name = "Times New Roman") +
Alignment(wrapText = T, horizontal = "ALIGN_CENTER") +
Border(color = "black", position = c("TOP", "BOTTOM"),
pen = c("BORDER_THIN", "BORDER_THICK"))
# Data formatting
format_date <- DataFormat("mm/dd/yyyy")
# Workbook sheets - NOTE: see if there is a way to loop over the number of rows
# from the SampleID column and produce a sheet for each row (i.e. sheet 1 = SampleID 0-3cm)
sheet <- createSheet(wb, sheetName = "PPI-10 Geochem")
# Titling helper function (from: http://www.sthda.com/english/wiki/r-xlsx-package-a-quick-start-guide-to-manipulate-excel-files-in-r)
xlsx.addTitle <- function(sheet, rowIndex, title, titleStyle){
rows <- createRow(sheet, rowIndex = rowIndex)
sheetTitle <- createCell(rows, colIndex = 1)
setCellValue(sheetTitle[[1,1]], title)
setCellStyle(sheetTitle[[1,1]], titleStyle)
}
# Add title and subtitle
xlsx.addTitle(sheet, rowIndex = 1, title = "UConn Sediment Core pXRF Spreadsheet",
titleStyle = wb_title_style) # Be sure not to backquote the value
# for title style, it will produce an error in referencing
xlsx.addTitle(sheet, rowIndex = 2, title = "Project/Core ID:",
titleStyle = wb_subtitle_style)
xlsx.addTitle(sheet, rowIndex = 3, title = "Operator:",
titleStyle = wb_subtitle_style)
xlsx.addTitle(sheet, rowIndex = 4, title = "Date:",
titleStyle = wb_subtitle_style)
# Import and process data
PPI10.pXRF_raw <- as.data.frame(read_xlsx("PPI-10_pXRF.xlsx")) # Import xlsx data as dataframe
PPI10.pXRF <- PPI10.pXRF_raw[, colSums(is.na(PPI10.pXRF_raw)) != nrow(PPI10.pXRF_raw)] #Remove rows of all NAs
PPI10 <- PPI10.pXRF[PPI10.pXRF$`Method Name` != "Cal Check", ] # Remove Cal Check rows
PPI10.calcheck <- PPI10.pXRF[PPI10.pXRF$`Method Name` == "Cal Check", ] # Produce separate Cal Check table
# Append data to workbook
addDataFrame(PPI10, sheet, startRow = 5, startColumn = 1,
colnamesStyle = wb_colnames_style,
rownamesStyle = wb_rownames_style)
setColumnWidth(sheet, colIndex = c(1:ncol(PPI10)), colWidth = 12)
# Save Workbook
saveWorkbook(wb, "R-pXRF-report_test.xlsx")`
You can do this simply using the paste
function and the unicode value for the "±" symbol:
df <- data.frame(value = seq(100, 500, 50),
error = seq(20, 60, 5))
df$new <- paste(df$value, "\u00B1", df$error)
# This also works
# paste(df$value, "±", df$error)
Output:
# value error new
# 1 100 20 100 ± 20
# 2 150 25 150 ± 25
# 3 200 30 200 ± 30
# 4 250 35 250 ± 35
# 5 300 40 300 ± 40
# 6 350 45 350 ± 45
# 7 400 50 400 ± 50
# 8 450 55 450 ± 55
# 9 500 60 500 ± 60
Also, since you said you were new, as a friendly note to posting on this forum, almost all of the text and code your question is unnecessary, and the question is not really concerned with your subject-specific application (I mean that in the nicest way, I promise!). You could have simply stated:
I have a data frame with
values
anderror
in separate columns. I need to combine them into a single column with the "±" symbol (i.e., "1000 ± 150"). Some sample data are:df <- data.frame(values = 10:15, error = 1:5)
The length and noise in your post is likely why it has not yet received an answer. Only hoping this helps you get better, faster help in future posts, and good luck!