rdata-manipulationr-xlsx

How to produce a column containing "values ± error" from two columns with one being "value" and the other being "error"


I am new to both R and asking questions in related forums, so please bear with me.

I have raw data, output from a geochemical analysis, which contains a large number of observations for over 200 variables. The output of this analysis generates a number of columns containing many types of data.

Of concern for this question are two specific column formats: the analysis outputs two separate columns (variables) for a reading of each element's abundance within a sample. The first is a column containing the magnitude, in PPM, of the abundance of the element (variable) for that sample (observation). The second column is the error in measurement for the abundance of the element. So, for example, it will essentially produce the following (simplified) format:

SampleID Magnesium Abundance [ppm] Magnesium error [ppm]
A1 10530 300

I have to produce a formal report which I am using the "xlsx" package for. In this report, I have to merge the two columns which report the abundance and the corresponding error of an element as output by the analysis from the format of two separate columns to one column containing the "abundance ± error", like so:

Sample ID Magnesium Abundance [ppm]
A1 10035 ± 250

My question is: is what I am attempting to do actually plausible? Is there a way to merge columns to produce a single column containing the "value" and "error" as "value ± error"?

I do not have any reproducible code for this as I am at a complete loss for this part.

Here is the entire code for my project thus far. I am sorry if it is hard to read or see where the problem is as I do not have reproducible code for the current issue I am working on, but I do have this.

In the section "# Import and process data", you can see where I upload my raw data, remove columns of all NA's, and produce a separate table for standardized samples (which check for machine analysis drift). I need to now merge the columns as mentioned above for all elements being measured.

library(xlsx)
library(tidyverse)
library(readxl)

### Create Excel Workbook
# Workbook title and subtitle style
wb <- createWorkbook(type = "xlsx")
wb_title_style <- CellStyle(wb) + Font(wb, color = "blue", 
                             heightInPoints = 14, 
                             name = "Times New Roman",
                             isBold = T,
                             underline = 0)
wb_subtitle_style <- CellStyle(wb) + Font(wb, color = "black",
                                heightInPoints = 12, 
                                name = "Times New Roman",
                                underline = 0)

# Workbook row and column name styles
wb_rownames_style <- CellStyle(wb) + Font(wb, heightInPoints = 12,
                                      isBold = F, 
                                      name = "Times New Roman")
wb_colnames_style <- CellStyle(wb) + Font(wb, heightInPoints = 12, 
                                      isBold = T,
                                      name = "Times New Roman") + 
  Alignment(wrapText = T, horizontal = "ALIGN_CENTER") + 
  Border(color = "black", position = c("TOP", "BOTTOM"), 
     pen = c("BORDER_THIN", "BORDER_THICK"))

# Data formatting
format_date <- DataFormat("mm/dd/yyyy")

# Workbook sheets - NOTE: see if there is a way to loop over the number of rows
# from the SampleID column and produce a sheet for each row (i.e. sheet 1 = SampleID 0-3cm)
sheet <- createSheet(wb, sheetName = "PPI-10 Geochem")

# Titling helper function (from: http://www.sthda.com/english/wiki/r-xlsx-package-a-quick-start-guide-to-manipulate-excel-files-in-r)
xlsx.addTitle <- function(sheet, rowIndex, title, titleStyle){
  rows <- createRow(sheet, rowIndex = rowIndex)
  sheetTitle <- createCell(rows, colIndex = 1)
  setCellValue(sheetTitle[[1,1]], title)
  setCellStyle(sheetTitle[[1,1]], titleStyle)
}

# Add title and subtitle
xlsx.addTitle(sheet, rowIndex = 1, title = "UConn Sediment Core pXRF Spreadsheet", 
          titleStyle = wb_title_style) # Be sure not to backquote the value 
                                       # for title style, it will produce an error in   referencing
xlsx.addTitle(sheet, rowIndex = 2, title = "Project/Core ID:",
              titleStyle = wb_subtitle_style)
xlsx.addTitle(sheet, rowIndex = 3, title = "Operator:",
              titleStyle = wb_subtitle_style)
xlsx.addTitle(sheet, rowIndex = 4, title = "Date:",
              titleStyle = wb_subtitle_style)

# Import and process data
PPI10.pXRF_raw <- as.data.frame(read_xlsx("PPI-10_pXRF.xlsx")) # Import xlsx data as dataframe
PPI10.pXRF <- PPI10.pXRF_raw[, colSums(is.na(PPI10.pXRF_raw)) != nrow(PPI10.pXRF_raw)] #Remove rows of all NAs
PPI10 <- PPI10.pXRF[PPI10.pXRF$`Method Name` != "Cal Check", ] # Remove Cal Check rows
PPI10.calcheck <- PPI10.pXRF[PPI10.pXRF$`Method Name` == "Cal Check", ] # Produce separate Cal Check table

# Append data to workbook
addDataFrame(PPI10, sheet, startRow = 5, startColumn = 1,
         colnamesStyle = wb_colnames_style,
         rownamesStyle = wb_rownames_style)
setColumnWidth(sheet, colIndex = c(1:ncol(PPI10)), colWidth = 12)

# Save Workbook
saveWorkbook(wb, "R-pXRF-report_test.xlsx")`

Solution

  • You can do this simply using the paste function and the unicode value for the "±" symbol:

    df <- data.frame(value = seq(100, 500, 50),
                     error = seq(20, 60, 5))
    
    df$new <- paste(df$value, "\u00B1", df$error)
    
    # This also works
    # paste(df$value, "±", df$error)
    

    Output:

    #   value error      new
    # 1   100    20 100 ± 20
    # 2   150    25 150 ± 25
    # 3   200    30 200 ± 30
    # 4   250    35 250 ± 35
    # 5   300    40 300 ± 40
    # 6   350    45 350 ± 45
    # 7   400    50 400 ± 50
    # 8   450    55 450 ± 55
    # 9   500    60 500 ± 60
    

    Also, since you said you were new, as a friendly note to posting on this forum, almost all of the text and code your question is unnecessary, and the question is not really concerned with your subject-specific application (I mean that in the nicest way, I promise!). You could have simply stated:

    I have a data frame with values and error in separate columns. I need to combine them into a single column with the "±" symbol (i.e., "1000 ± 150"). Some sample data are: df <- data.frame(values = 10:15, error = 1:5)

    The length and noise in your post is likely why it has not yet received an answer. Only hoping this helps you get better, faster help in future posts, and good luck!