Archives

  • 2019-10
  • 2019-11
  • 2020-03
  • 2020-07
  • 2020-08
  • 20033 br tandem mass spectrometry LC MS MS employing a

    2020-08-18


    203 tandem mass spectrometry (LC-MS/MS) employing a nanoflow LC system (EASY-nLC 1200,
    204 ThermoFisher Scientific) coupled online with a tribrid Orbitrap MS (Fusion Lumos, Thermo
    205 Fisher Scientific). Peptide identifications were generated by searching raw LC-MS data files
    206 with a publicly-available, non-redundant human proteome database (Swiss-Prot, Homo sapiens
    ACCEPTED MANUSCRIPT
    208 Science) and Proteome Discoverer (PD). The resulting peptide spectral matches (PSMs) were
    210 Percolator module of PD. TMT reporter 20033 intensities were extracted using PD at a mass
    211 tolerance of 20 ppm and PSMs lacking a TMT reporter ion signal in the pooled study reference
    212 channel, lacking TMT reporter ion intensity in all TMT channels, or PSMs exhibiting an
    213 isolation interference of ≥ 50% were excluded from downstream analyses.
    214 The abundance of proteins identified by unique PSMs were determined by calculating the
    215 median log2 abundance ratios of all PSMs corresponding to a unique protein accession. The
    216 abundance of PSMs mapping to multiple proteins were compared to corresponding unique
    217 protein abundances using a mean squared error (MSE) approach to enable their assignment to
    218 unique proteins based on comparative abundance analyses. Protein-level abundance was
    219 calculated from normalized, median log2-transformed TMT reporter ion ratio abundances from a
    220 minimum of two PSMs corresponding to a single protein accession. Normalized log2-
    221 transformed protein-level abundance for each TMT-11 multiplex was merged and protein-level
    222 abundance for proteins not quantified in all patient samples, but in at least ≥ 50%, were imputed
    223 using a k-nearest neighbor (k-NN) strategy in the pamr prediction analysis for microarrays R-
    225 Differences in baseline demographics were determined by c2. Cases were separated into
    226 three groups based on time of blood draw prior to endometrial cancer diagnosis: case group 1 (≤
    228 early detection serum biomarkers are a subset of the differentially abundant proteins between
    229 case group 1 and matched controls. The differentially abundant proteins for case group 1 versus
    ACCEPTED MANUSCRIPT
    230 matched controls were identified using Linear Models for Microarray Data (LIMMA) method
    232 significantly altered in case group 1 and matched controls exhibiting identical fold-change trends
    233 in comparative analyses of cases and controls (i.e. case group 2 versus matched controls, case
    234 group 3 versus matched controls, case group 1 versus case group 2, and case group 2 versus case
    235 group 3) were prioritized for integrated risk score analyses. Further, prioritized protein
    236 abundance alterations were assessed relative to serum albumin protein abundance across cohorts
    237 to confirm that alteration trends were not confounded by differential efficiency of
    238 immunodepletion column performance.
    239 The least absolute shrinkage and selection operator (LASSO) regression analysis was
    240 applied to prioritized protein alterations using the glmnet R package (version 2.0-5) and by
    241 further selecting the optimal tuning parameter with the lowest MSE by running 10-fold cross-
    242 validation 100 times. This analysis yielded a predictive set of proteins highly associated with
    243 cancer status and this integrated score was calculated for each sample by summing the relative
    244 abundance of the selected proteins weighted by their respective LASSO coefficients. The
    245 performance of the integrated score was evaluated using receiver operator curve (ROC) analysis
    ACCEPTED MANUSCRIPT
    251 Demographics were compared between cases and controls (Table 1). Mean age for cases was
    255 to have used oral contraceptives (P=0.022). Controls were more likely to have been current or
    256 former smokers (P=0.013). There were no differences between groups among other known risk
    257 or protective factors for endometrial cancer.
    264 Quantitative proteomic analyses resulted in the identification of 1,100 total proteins, 565
    265 of which were co-quantified across all patient samples (Supplemental Table 1). Endometrial
    266 cancer cases were then separated into 3 groups based on time of blood draw prior to endometrial
    269 (diagnosed >5 years). Forty-seven proteins were identified as significantly altered (LIMMA p-
    270 value < 0.05) between case group 1 and their matched controls. Ten proteins were further
    271 prioritized as they exhibited consistent abundance alterations in cases compared to controls