Import Data into a Spectra Object — files2SpectraObject • ChemoSpec

These functions import data into a Spectra object. For "csv-like" files they use read.table, so they are very flexible in regard to file formatting. Be sure to see the ... argument below for important details you need to provide. files2SpectraObject can also read JCAMP-DX files and will do so if fileExt is any of "dx", "DX", "jdx" or "JDX".

files2SpectraObject(
  gr.crit = NULL,
  gr.cols = "auto",
  freq.unit = "no frequency unit provided",
  int.unit = "no intensity unit provided",
  descrip = "no description provided",
  fileExt = "\\.(csv|CSV)$",
  out.file = "mydata",
  debug = FALSE,
  ...
)

matrix2SpectraObject(
  gr.crit = NULL,
  gr.cols = c("auto"),
  freq.unit = "no frequency unit provided",
  int.unit = "no intensity unit provided",
  descrip = "no description provided",
  in.file = NULL,
  out.file = "mydata",
  chk = TRUE,
  ...
)

Arguments

gr.crit

Group Criteria. A vector of character strings which will be searched for among the file/sample names in order to assign an individual spectrum to group membership. This is done using grep, so characters like "." (period/dot) do not have their literal meaning (see below). Warnings are issued if there are file/sample names that don't match entries in gr.crit or there are entries in gr.crit that don't match any file names.

gr.cols

Group Colors. See colorSymbol for some options. One of the following:

Legacy behavior and the default: The word "auto", in which case up to 8 colors will be automatically assigned from package RColorBrewer Set1.
"Col7". A unique set of up to 7 colorblind-friendly colors is used.
"Col8". A unique set of up to 8 colors is used.
"Col12". A mostly paired set of up to 12 colors is used.
A vector of acceptable color designations with the same length as gr.crit.

Colors will be assigned one for one, so the first element of gr.crit is assigned the first element of gr.col and so forth. For Col12 you should pay careful attention to the order of gr.crit in order to match up colors.

freq.unit

A character string giving the units of the x-axis (frequency or wavelength).

int.unit

A character string giving the units of the y-axis (some sort of intensity).

descrip

A character string describing the data set that will be stored. This string is used in some plots so it is recommended that its length be less than about 40 characters.

fileExt

A character string giving the extension of the files to be processed. regex strings can be used. For instance, the default finds files with either ".csv" or ".CSV" as the extension. Matching is done via a grep process, which is greedy. See also the "Advanced Tricks" section.

out.file

A file name. The completed object of S3 class Spectra will be written to this file.

debug

Logical. Applies to files2SpectraObject only. Set to TRUE for troubleshooting when an error is thrown during import. In addition, values of 1-5 will work when importing a JCAMP-DX file via fileExt = "\.jdx" etc. These will be passed through to the readJDX function. See there for much more info on importing JCAMP-DX files.

...

Arguments to be passed to read.table, list.files or readJDX; see the "Advanced Tricks" section. For read.table, You MUST supply values for sep, dec and header consistent with your file structure, unless they are the same as the defaults for read.table.

in.file

Character. Applies to matrix2SpectraObject only. Input file name, including extension. Can be a vector of file names.

chk

Logical. Applies to matrix2SpectraObject only. Should the Spectra object be checked for integrity? If you are having trouble importing your data, set this to FALSE and do str(your object) to troubleshoot.

Value

A object of class Spectra. An unnamed object of S3 class Spectra is also written to out.file. To read it back into the workspace, use new.name <- loadObject(out.file) (loadObject is in package R.utils).

Functions

files2SpectraObject(): Import data from separate csv files
matrix2SpectraObject(): Import a matrix of data

files2SpectraObject

files2SpectraObject acts on all files in the current working directory with the specified fileExt so there should be no extra files of that type hanging around (except see next paragraph). The first column should contain the frequency values and the second column the intensity values. The files may have a header or not (supply header = TRUE/FALSE as necessary). The frequency column is assumed to be the same in all files.

If fileExt contains any of "dx", "DX", "jdx" or "JDX", then the files will be processed by readJDX. Consider setting debug = TRUE, or debug = 1 etc for this format, as there are many options for JCAMP, and many are untested. See readJDX for options and known limitations.

matrix2SpectraObject

This function takes one or more csv-like files, containing frequencies in the first column, and samples in additional columns, and processes it into a Spectra object. The file MUST have a header row which includes the sample names. There need not be a header for the first (frequency) column. If more than one file given, they must all have the same frequency entries.

gr.crit and Sample Name Gotchas

The matching of gr.crit against the sample file names (in files2SpectraObject) or column headers/sample names (in matrix2SpectraObject) is done one at a time, in order, using grep. While powerful, this has the potential to lead to some "gotchas" in certain cases, noted below.

Your file system may allow file/sample names which R will not like, and will cause confusing behavior. File/sample names become variables in ChemoSpec, and R does not like things like "-" (minus sign or hyphen) in file/sample names. A hyphen is converted to a period (".") if found, which is fine for a variable name. However, a period in gr.crit is interpreted from the grep point of view, namely a period matches any single character. At this point, things may behave very differently than one might hope. See make.names for allowed characters in R variables and make sure your file/sample names comply.

The entries in gr.crit must be mutually exclusive. For example, if you have files with names like "Control_1" and "Sample_1" and use gr.crit = c("Control", "Sample") groups will be assigned as you would expect. But, if you have file names like "Control_1_Shade" and "Sample_1_Sun" you can't use gr.crit = c("Control", "Sample", "Sun", "Shade") because each criteria is grepped in order, and the "Sun/Shade" phrases, being last, will form the basis for your groups. Because this is a grep process, you can get around this by using regular expressions in your gr.crit argument to specify the desired groups in a mutually exclusive manner. In this second example, you could use gr.crit = c("Control(.*)Sun", "Control(.*)Shade", "Sample(.*)Sun", "Sample(.*)Shade") to have your groups assigned based upon both phrases in the file names.

To summarize, gr.crit is used as a grep pattern, and the file/sample names are the target. Make sure your file/sample names comply with make.names.

Finally, samples whose names are not matched using gr.crit are still incorporated into the Spectra object, but they are not assigned a group or color. Therefore they don't plot, but they do take up space in a plot! A warning is issued in these cases, since one wouldn't normally want a spectrum to be orphaned this way.

All these problems can generally be identified by running sumSpectra once the data is imported.

Advanced Tricks

The ... argument can be used to pass any argument to read.table or list.files. This includes the possibility of passing arguments that will cause trouble later, for instance na.strings in read.table. While one might successfully read in data with NA, it will eventually cause problems. The intent of this feature is to allow one to recurse a directory tree containing the data, and/or to specify a starting point other than the current working directory. So for instance if the current working directory is not the directory containing the data files, you can use path = "my_path" to point to the desired top-level directory, and recursive = TRUE to work your way through a set of subdirectories. In addition, if you are reading in JCAMP-DX files, you can pass arguments to readJDX via ..., e.g. SOFC = FALSE. Finally, while argument fileExt appears to be a file extension (from its name and the description elsewhere), it's actually just a grep pattern that you can apply to any part of the file name if you know how to construct the proper pattern.

Author

Bryan A. Hanson (DePauw University).

Examples

if (FALSE) { # \dontrun{
# This example assumes the graphics output is set to ggplot2 (see ?GraphicsOptions).
library("ggplot2")

wd <- getwd() # save current location
setwd(tempdir())

# Grab an included file & move to a temporary directory
tf <- system.file("extdata/PCRF.jdx", package = "ChemoSpec")
chk <- file.copy(from = tf, to = basename(tf))

# Now read in the file, summarize and plot
spec <- files2SpectraObject(
  gr.crit = "PCRF", freq.unit = "ppm", int.unit = "intensity",
  descrip = "test import", fileExt = "\\.jdx"
)
sumSpectra(spec)
p <- plotSpectra(spec, lab.pos = 3.5, main = "Reduced Fat Potato Chip")
p <- p + ggtitle("Reduced Fat Potato Chip")
p

setwd(wd) # restore working directory
} # }