I am parsing a csv with multiple columns. The number of columns is not fixed in the csv file. It varies from 5 to 10. I need to recreate a data.frame with these columns inside a function. I am wondering if there is any multiple arguments functionality in R like one in Ruby(*args). If not, How to achieve this??? I searched a bit and found that if I have a col name as
col1
col2
I can use:
list <- ls(pat="^col\\d$")
and pass this list as an argument to a function, but it will pass just column names, as characters, not the values these column names are carrying.
Any suggestions????
Edit:
I am parsing a file from RoR app and using RinRuby gem to call R functions. So parsing a csv from ruby and passing individual column contents as a single variable in R. Now in R, I need to create a data.frame. So actually its not a data frame originally. So in the method cal_norm
below I am assigning variables in R using a loop with names col1, col2, col3....and so on.
here is the rails code:
class UploadsController < ApplicationController
attr_accessor :calib_data, :calib_data_transpose, :inten_data, :pr_list
def index
@uploads = Upload.all
@upload = Upload.new
respond_to do |format|
format.html
format.json { render json: @uploads }
end
end
def create
@upload = Upload.new(params[:upload])
directory = "public/"
io_calib = params[:upload][:calib]
io_inten = params[:upload][:inten]
name_calib = io_calib.original_filename
name_inten = io_inten.original_filename
calib_path = File.join(directory, "calibs", name_calib)
inten_path = File.join(directory, "intens", name_inten)
respond_to do |format|
if @upload.save
@calib_data, @calib_data_transpose = import(calib_path)
@inten_data = import_ori(inten_path)
#probe list of the uploaded file
@probe_list = calib_data_transpose[0]
logger.debug @probe_list.to_s
flash[:notice] = "Files were successfully uploaded!!"
format.html
#format.js #{ render json: @upload, status: :created, location: @upload }
else
flash[:notice] = "Error in uploading!!"
format.html { render action: "index" }
format.json { render json: @upload.errors, status: :unprocessable_entity }
end
end
end
def cal_norm
#ajax request
data = params['data'].split(',')
for i in 0..@calib_data_transpose.length - 1
R.assign "col#{i}", @calib_data_transpose[i]
end
R.assign "cells", @inten_data
R.assign "pr", data
R.eval <<-EOF
# make sure to convert them in character and numeric vectors
#match the selected pr in the table
#convert the found row of values from data.frame to numeric
#divide each column of the table by the respective pr values and create a new table repat it with different pr.
#make a new table with the ce count and different probe normalization and calculate for individual pr
#finally return a data.frame with pr names and cell counts
#return individual columns as an array not in the form of matrix/data.frame
EOF
end
def import(file_path)
array = import_ori(file_path)
array_splitted = array.map {|a| a.split(",")}
array_transpose = array_splitted.transpose
return array_splitted, array_transpose
end
def import_ori(file_path)
string = IO.read(file_path)
array = string.split("\n")
array.shift
return array
end
end
Post updated question:
I am utter newbie of Ruby but found this example here: col wise data
Here column wise data is read into col_data, the 0 here is the (col) index (no Ruby for testing :( )
require 'csv'
col_data = []
CSV.foreach(filename) {|row| col_data << row[0]}
Assign the col data to a variables col1...coln, and create a counter for number of columns (syntax might not be 100% correct)
for i in 0..@calib_data_transpose.length - 1
#R.assign "col#{i}", @calib_data_transpose[i]
CSV.foreach(filename) {|row| "col#{i}" << row[i]}
end
R.col_count=@calib_data_transpose.length - 1
And once col1..coln are created, combine the column data one index at a time starting at i = 1. The result will a data.frame with order of columns as col1.... coln.
R.eval <<-EOF
for(i in 1:col_count) {
if (i==1) {
df<-data.frame(get(paste0("col",i)))
}
else {
df<-cbind(df,get(paste0("col",i)))
}
names(df)[i]<-paste0("col",i)
}
EOF
Let us know if this helps...
Not very relevant to updated question anymore but retaining it for posterity.
Subset data.frame for a given pattern
As Roland stated above read.csv
will read the entire file, since you wish to control which columns are retained in the data.frame you could do the following:
Using data(mtcars)
as sample data.frame
Code:
Read in the data:
> data(mtcars)
> head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Subset the data for some condition, say columns beginning with alphabet 'c'
> head(mtcars[,grep("^c",colnames(mtcars))])
cyl carb
Mazda RX4 6 4
Mazda RX4 Wag 6 4
Datsun 710 4 1
Hornet 4 Drive 6 1
Hornet Sportabout 8 2
Valiant 6 1
Here '^c'
is similar to the pattern pat="^col\\d$"
from your question. You could substitute '^c'
with any regular expression of your choice e.g '^col'
.The '^c'
will match any pattern beginning with alphabet 'c', to match at the end of the string use '$c'