A recently proposed unsupervised deconvolution method[1]
uses principal-component analysis (PCA) of the covariance TOCSY
spectrum of a mixture. In the absence of significant spectral overlap,
the dominant PCA eigenmodes approximate well the 1D spectra of the
individual components. For increasing degrees of spectral overlap
between components, however, A
fundamental problem in many areas of chemistry is the identification of
components in chemical mixtures, such as different solutes in a
solution. The recent advent of metabolomics has generated a critical
demand for powerful analysis methods for fluid mixtures in the food and
life sciences. While important progress is being made in potentially laborious and costly hyphenated methods, spectroscopic methods have the power to circumvent or reduce the need for hyphenation prior to analysis. Most compounds contain multiple NMR-active spins that are J-coupled
and allow the identification of spin-spin coupling networks for
discrimination between components, as well as their subsequent
identification by screening against a database. Particularly useful in
this regard is the 2D NMR 1H-1H TOCSY experiment,
which monitors multiple relay transfers of spin magnetization within a
spin system to provide a wealth of information on scalar spin-spin
coupling connectivity with high sensitivity. Because experimental
efficiency is a prerequisite for high-throughput applications, TOCSY is
combined here with covariance NMR,[3]-[5] which produces high-resolution spectra largely independent of the number of increments along the indirect time domain t1. The
DemixC method is demonstrated for three samples of differing
complexity. Sample I consists of three amino acids (Glu, Lys, Val)
dissolved in D2O. Sample II contains four amino acids (Glu, Leu, Lys, Val) in D2O. The amino acid concentration of samples I and II is 7 mM. Sample III contains the cyclic decapeptide antamanide [-Val-Pro-Pro-Ala-Phe-Phe-Pro-Pro-Phe-Phe-] dissolved in deuterated chloroform at a concentration of 1 mM.
While the dissolved peptide of sample III is not an actual mixture, in
terms of its proton NMR properties it behaves like a mixture of 10
amino acids at 1 mM concentration each. The
low variability of the amino acid composition (four phenylalanine and
four proline residues) leads to significant resonance overlap providing a rigorous test case for the performance of the proposed method. Covariance processing was performed in the mixed-time frequency domain as described previously.[5] Briefly, for 2D TPPI TOCSY datasets the time-domain data are Fourier-transformed along the direct dimension t2,
phase- and baseline corrected, followed by elimination of the
dispersive part, and then subjected to singular-value decomposition
(SVD) to determine the matrix square-root of the covariance spectrum.
For 2D TPPI-States TOCSY datasets, the cosine- and sine-modulated t1 parts are first Fourier-transformed along t2, followed by phase correction and elimination of the dispersive parts. The square-root of the covariance spectrum C
is then computed by applying SVD individually to the cosine- and the
sine-modulated parts before they are co-added. As is characteristic for
covariance NMR, the resulting spectra are fully symmetric and display
the same high spectral resolution along both dimensions. Examples of
covariance TOCSY spectra of the three samples are shown in
Figure 1.Robust Deconvolution of Complex Mixtures by Covariance TOCSY Spectroscopy
Fengli Zhang, and Rafael Brüschweiler
Department of Chemistry and Biochemistry, Florida State University and National High Magnetic Field Laboratory (NHMFL), Tallahassee, FL
Summary
mixed modes
emerge whose assignment to known compounds can pose a significant challenge.[1]
The method presented here, which is termed DemixC (C stands for
clustering) [2], overcomes this limitation by identifying for each
component characteristic traces that are essentially free of overlap
and can be identified and assigned with high confidence.Introduction
Results and Discussion

Next, the similarity or overlap Oij between each row vector
and column vector cj of C is determined. The inner product between these vectors (traces) represents a suitable metric of similarity [Eq. (1)]
![]() | (1) |
![]() | (2) |
where O is the
overlap matrix
with elements Oij. The larger Oij is, the higher is the overlap and thereby the similarity of the covariance TOCSY traces represented by vectors ci and cj.
Because diagonal peaks tend to have disproportionately large amplitudes
that dominate those of the inner products, prior to the overlap
calculation each diagonal peak is replaced by a Gaussian peak with the
amplitude of the largest nondiagonal peak in the same column or row.
This leads to a modified overlap matrix O
for which the influence of the diagonal of the covariance spectrum is diminished.
Next, the elements of each column of O
are coadded to form a vector P with elements Pj termed importance index [Eq. (3)].
![]() | (3) |
The importance index Pj is a quantitative measure for the cumulative overlap between the TOCSY trace (column or row) at frequency
j with all other traces (columns or rows). A large component Pj indicates that the covariance TOCSY column j has strong overlaps with other rows, whereas a low Pj
value reflects little overlap. Overlaps stem from rows belonging to
other spins of the same spin system, as well as rows of other spin
systems whose resonances overlap with the resonances of row j.
Vectors that belong to the same spin system have resonances at the same
positions provided that the distribution of magnetization via isotropic
mixing during the TOCSY experiment is sufficiently uniform among the
spins.
In a next step, a subset of rows of interest is identified based on their importance index by applying standard peak picking to P.
This involves the determination of local maxima above a given
threshold. This threshold should be higher than the noise floor and can
be adjusted to exclude weak traces that are not of interest. This
yields a list of rows of the covariance TOCSY spectrum representing a
small subset of all traces. The members of this list are then clustered
on the basis of mutual overlaps of the normalized rows of C,
,
to identify a unique set of spin systems and compounds. These are then
displayed as the corresponding traces of the covariance matrix with the
original diagonal peak scaled such that it is identical to the maximal
off-diagonal peak in the same trace. The final set of magnitude traces
represents the individual components that can be identified and
assigned, for example, by screening against a spectral database.
The
DemixC method is first demonstrated for sample I, which contains amino
acids E, K, V. The covariance TOCSY spectrum is shown in
Figure 1 A. The importance index vector P is constructed from O
followed by peak picking (Figure 1 D). In this way, nine cross sections (rows) in the covariance spectrum C are identified (peak positions marked by filled circles in Figure 1 D) and plotted in Figure 2 A. The mutual overlaps
between the nine rows are shown in Figure 2 B. The higher a
value, the more similar are the corresponding rows i and j.

Basic clustering of the overlaps immediately reveals that rows 1, 5, and 6 represent the same compound (or spin system), rows 2 and 3 represent a second compound, and rows 4, 7, 8, and 9 represent a third compound. Because all nine rows can be assigned to one of the three clusters, it follows that the TOCSY spectrum of sample I contains no other detectable compound. The three clusters are represented by the trace spectra 1, 2, and 4.
In a next step, the compounds underlying the selected cross sections are identified by comparison with 1D spectra contained in an NMR databank. Here we chose the metabolomics/metabonomics part of the Biological Magnetic Resonance Data Bank (BMRB, http://www.bmrb.wisc.edu/metabolomics/), which is worldwide in the public domain. The proton 1D spectra contained in the BMRB for the three amino acids E, K, V are shown in Figure 3 B and compared with their cluster representatives (rows 1, 2, and 4 of Figure 3 A). The correspondence between the covariance traces and the BMRB spectra is very good. Even the peak multiplets show good agreement. Relative peak intensity differences stem from nonuniform TOCSY transfer, differential relaxation effects, and from the scaling of the diagonal part of the covariance TOCSY traces.

In the absence of overlaps between resonances belonging to different spin systems, as is the case for sample I, TOCSY traces for spins of the same spin system reflect the 1D spectrum of the spin system, and therefore they contain equivalent information. In the presence of overlap, the situation changes, as is seen for sample II, which contains Leu (L) as a fourth amino acid. For this mixture, significant peak overlap occurs, particularly between the Leu and Lys spin systems (Figure 1 B). Still, the protocol produces clusters of traces which can be assigned to individual components (Figure 3 C). Importantly, the representative trace for each cluster is chosen to have a minimal importance index (Figure 1 E). This ensures selection of those traces that have low overlap with other spin systems. From Figure 3 it is evident that the selected traces (Panel C) agree well with the BMRB spectra of these components (Panel D).
Application
of the algorithm to the cyclic decapeptide antamanide provides a
stringent test of the deconvolution method. The ten amino acids lead to
the rich covariance spectrum shown in Figure 1 C, which exhibits substantial peak overlaps. Peak picking of the importance vector (Figure 1 F) yields 33 trace vectors together with their mutual overlap matrix (see Supporting Information).
Inspection of the traces reveals numerous regions with strong overlap.
Cluster analysis yields the 11 representative traces depicted in
Figure 4. The bottom 10 traces correspond to
the amide and aliphatic proton resonances of the 10 amino acids, while
trace 11 (top trace in Figure 4) represents the strongly overlapping aromatic resonances of the phenylalanine rings. The amino acid traces of Figure 4 are fully consistent with the assignments of antamanide.
The traces of Ala and Val are as easily identified as the traces of
samples I and II. The four Phe and four Pro residues show significant
variability in their chemical shifts due to structural and dynamic
differences. The residue that overlaps most severely is F9: its
,
, and 
proton signals fully or partially overlap with those of F6, and its HN
proton signal overlaps with that of F10. Nonetheless, the DemixC
protocol succeeds in finding a representative trace of this residue.

The analysis of mixtures by the DemixC method is based on abundant spin connectivities in total correlation spectroscopy and provides an efficient means for the spectral identification of spin systems and their compounds. The covariant nature of the spectrum ensures high resolution along both frequency dimensions, which is critical for the success of the method. Covariance TOCSY fundamentally differs from STOCSY, in that it uses covariances over 1D spectra with different t1 evolution times of the same sample, whereas in STOCSY covariances are computed over 1D spectra of different samples. A previous method based on principal-component analysis (PCA)[1] requires a series of TOCSY spectra recorded with different mixing times. For the method presented here, this is not a requirement provided that the chosen mixing time is long enough to allow sufficient magnetization transfer throughout the whole spin system. For the mixtures used here, mixing times between 60 and 100 ms work well. Longer mixing times are feasible, although relaxation effects will lower the signal-to-noise ratio.
For compounds that contain multiple spin systems (i.e., spin systems that are disconnected from each other), as is the case for the individual amino acids of antamanide, each spin system yields an independent trace as if it belonged to an individual molecule. When identifying compounds in mixtures from the covariance TOCSY traces, this property of the TOCSY experiment must be taken into account.
The DemixC method readily identifies the best candidates for individual spin-system traces based on their importance index determined by the sum of the overlaps with all other candidate traces. Essential for the success of the method is the recognition that traces with a low to medium importance index are more likely to represent individual spin systems, whereas traces with a large importance index are more likely to be prone to overlap. By contrast, the PCA method tends to represent overlapping spin systems by some of the largest modes in case they explain together a larger fraction of the TOCSY spectrum.
Extreme resonance overlap imposes natural restrictions: if all resonances of a certain compound overlap with resonances of other systems, there is no guarantee that the deconvolution method will succeed in identifying the compound. The Phe-9 residue of antamanide represents such a case. Although the deconvolution procedure produces the correct result, generally the trace selection tends to become ambiguous when the number of overlaps of a component is very large.
The DemixC deconvolution approach introduced here takes full advantage of the high spectral resolution and redundant connectivity information of covariance TOCSY spectra. The trace analysis based on the importance index and subsequent clustering is highly efficient, remarkably robust, and provides individual 1D spectral information on the underlying spin systems. The method is directly applicable to the semi-automated side-chain assignment of peptides and small proteins. As small-molecule NMR databases are rapidly growing, such as the BMRB metabolomics databank, traces identified in covariance TOCSY spectra can be automatically screened against these databases to identify and quantify the TOCSY traces. This provides a path for the deconvolution of complex biological mixtures that is both efficient and reliable.