TitleClustering millions of tandem mass spectra.
Year of Publication2008
AuthorsFrank AM, Bandeira N, Shen Z, Tanner S, Briggs SP, Smith RD, Pevzner PA
JournalJ Proteome Res
Date Published2008 Jan
KeywordsAmino Acid Sequence, Cluster Analysis, Computational Biology, Molecular Sequence Data, Peptides, Proteomics, Tandem Mass Spectrometry

Tandem mass spectrometry (MS/MS) experiments often generate redundant data sets containing multiple spectra of the same peptides. Clustering of MS/MS spectra takes advantage of this redundancy by identifying multiple spectra of the same peptide and replacing them with a single representative spectrum. Analyzing only representative spectra results in significant speed-up of MS/MS database searches. We present an efficient clustering approach for analyzing large MS/MS data sets (over 10 million spectra) with a capability to reduce the number of spectra submitted to further analysis by an order of magnitude. The MS/MS database search of clustered spectra results in fewer spurious hits to the database and increases number of peptide identifications as compared to regular nonclustered searches. Our open source software MS-Clustering is available for download at or can be run online at

PubMed ID18067247