## 
## readJDX is tested with > 175 files before release.
## However, there are still challenging files out in the wild.
## If you have trouble importing a file, please file an issue at
## github.com/bryanhanson/readJDX/issues

This vignette is based on readJDX version 0.6.4.

1 Background

The JCAMP-DX format was developed as an manufacturer-independent means of sharing spectroscopic data. The standard is described in a series of publications (McDonald and Wilks 1988; Grasselli 1991; A. Davies and Lampen 1993; Lampen et al. 1994, 1999; Baumbach et al. 2001; Cammack et al. 2006; Woollett et al. 2012). There is a recent overview of the standard (A. N. Davies et al. 2022). JCAMP-DX was developed during a time when data storage was expensive, and hence makes extensive use of compression schemes. The original application was to IR spectroscopy, but the standard has evolved over time to accommodate other spectroscopies.

2 File Structure

JCAMP-DX files consist of two parts:

  • A more-or-less human readible set of metadata which is needed to understand the data and verify the accuracy of any needed decompression. Besides required basic information about the data itself, most files contain instrument and manufacturer-specific parameters in the metadata.
  • A variable list, compressed in various ways.

3 Challenges When Reading Files

The JCAMP-DX standard allows a lot of flexibility and instrument manufacturers have written widely varying export functions. Some of the challenges in reading a JCAMP-DX file include:

  • JCAMP-DX files can contain different kinds of data, including non-spectroscopic data (Gasteiger et al. 1991) and more than one type of spectroscopic data.
  • JCAMP-DX files can contain more than one spectrum in the file.
  • Instruments may be configured to use . or , as the decimal point when writing files. This is generally a geographical / cultural nuance.
  • Numbers may be written using E to signify exponent, but only in some compression formats.
  • The variable list can be presented in several possible formats.
  • Some manufacturers take liberties with the required format.

4 Supported Formats

  • Variable lists can be presented in several different formats. The supported formats are:
    • XYDATA=(X++(Y..Y)) Each line starts with an \(x\) value, and is followed by as many \(y\) values as can fit within the 80 character per line limit. Subsequent \(x\) values are incremented according to the \(x\) resolution and the number of \(y\) values that fit on the previous line (which in turn depends upon the compression scheme).
    • DATA TABLE=(X++(R..R)) As above. The real data from a 1D NMR spectrum.
    • DATA TABLE=(X++(I..I)) As above. The imaginary data from a 1D NMR spectrum.
    • DATA TABLE=(F2++(Y..Y)) As above. Format used for the slices of a 2D NMR spectrum.
    • PEAK TABLE=(XY..XY) Entries are \(x\),\(y\) pairs separated by spaces or semicolons. No compression is used. Used for example for single MS spectra.
    • DATA TABLE=(XI..XI) Entries are \(x\),\(y\) pairs separated by spaces or semicolons. No compression is used. No compression is used. Used for instance in LC-MS data sets.
    • XYPOINTS= (XY..XY) Entries are \(x\),\(y\) pairs separated by spaces or semicolons. No compression is used. Used for many types of spectra.
  • Within a variable list, several different compression schemes can be employed. The following are supported:
    • AFFN: ASCII numbers separated by at least one space, or + or -.
    • PAC: Numbers separated by exactly one space, + or -.
    • SQZ: Delimiter, leading digit and sign are replaced by a pseudo-digit. A pseudo-digit is typically a letter.
    • DIF: DIF uses a SQZ pseudo-digit for the first \(y\) value, but subsequent \(y\) entries are differences between each data value after the first. Sometimes referred to as SQZDIF.
    • DUP: Not a format, but a method of signifying repeated values.
    • DIFDUP: A combination of DIF and DUP. Widely used, as it permits the greatest amount of compression.

5 Formats That are Not Supported

  • Mixed spectroscopic types and non-spectroscopic entries (such as structures) are not supported by readJDX and will not be supported in the future.
  • Compound files: JCAMP-DX files may contain more than one spectrum in the file. The following JCAMP-DX standards require a compound file and are therefore not directly supported, however a utility function splitMultiblockDX can separate these compound files into separate files that can be read by readJDX:
  • readJDX is geared toward raw spectral data. Therefore variable lists formats representing derived information like PEAK ASSIGNMENTS are not supported (but your pull requests are welcomed!).

6 Practical Matters

readJDX tries its best to deal with all these options. If you have a file that you believe should be supported but gives an error, please file an issue at GitHub. Be sure to attach the file that is giving you problems.

Before release, readJDX is tested against a large collection of files with varying formats. A few of these files were obtained locally. Others were collected from publically available sources (e.g. www.jcamp-dx.org). These files are not included with the package to save space, and in addition, while they are publically available, for many of them the licensing status is unclear (i.e. the OWNER entry).

The JCAMP standard requires a number of checks on the integrity of the data decompression process. readJDX implements most of these either directly or indirectly. Verification is important, and we have found JCAMP files that were not written correctly in the process of checking integrity. For details about how data decompression is checked, please see the original source files.

References

Baumbach, JI, AN Davies, P Lampen, and H Schmidt. 2001. JCAMP-DX. A Standard Format for the Exchange of Ion Mobility Spectrometry Data - (IUPAC recommendations 2001).” Pure and Applied Chemistry 73 (11): 1765–82. https://www.degruyter.com/document/doi/10.1351/pac200173111765/html.
Cammack, R, Y Fann, RJ Lancashire, JP Maher, PS McIntyre, and R Morse. 2006. JCAMP-DX for electron magnetic resonance(EMR).” Pure and Applied Chemistry 78 (3): 613–31. https://www.degruyter.com/document/doi/10.1351/pac200678030613/html.
Davies, AN, and P Lampen. 1993. JCAMP-DX for NMR.” Applied Spectroscopy 47 (8): 1093–99.
Davies, Antony N., Robert M. Hanson, Peter Lampen, and Robert J. Lancashire. 2022. “An Overview of the JCAMP-DX Format.” Pure and Applied Chemistry 94 (6): 705–23. https://www.degruyter.com/document/doi/10.1515/pac-2021-2010/html.
Gasteiger, J., B. M. P. Hendricks, Hoever P., Jochum C., and Somberg H. 1991. JCAMP-CS: A Standard Exchange Format for Chemical Structure Information in a Computer-Readible Form.” Applied Spectroscopy 45 (1): 4–11.
Grasselli, JG. 1991. JCAMP-DX, A Standard Format for Exchange of Infrared-Spectra in Computer Readible Form.” Pure and Applied Chemistry 63 (12): 1781–92. https://www.degruyter.com/document/doi/10.1351/pac199163121781/html.
Lampen, P, H Hillig, AN Davies, and M Linscheid. 1994. JCAMP-DX for Mass Spectrometry.” Applied Spectroscopy 48 (12): 1545–52.
Lampen, P, J Lambert, RJ Lancashire, RS McDonald, PS McIntyre, DN Rutledge, T Frohlich, and AN Davies. 1999. An Extension to the JCAMP-DX Standard File Format, JCAMP-DX V.5.01 (IUPAC Recommendations 1999).” Pure and Applied Chemistry 71 (8): 1549–56. https://www.degruyter.com/document/doi/10.1351/pac199971081549/html.
McDonald, RS, and PA Wilks. 1988. JCAMP-DX, A Standard Format for Exchange of Infrared-Spectra in Computer Readible Form.” Applied Spectroscopy 42 (1): 151–62.
Woollett, Benjamin, Daniel Klose, Richard Cammack, Robert W. Janes, and B. A. Wallace. 2012. JCAMP-DX for circular dichroism spectra and metadata (IUPAC Recommendations 2012).” Pure and Applied Chemistry 84 (10): 2171–82. https://www.degruyter.com/document/doi/10.1351/PAC-REC-12-02-03/html.

  1. Professor Emeritus of Chemistry & Biochemistry, DePauw University, Greencastle IN USA., ↩︎