We performed a series of bioinformatics analysis on a set of important gene expression data with 76 samples in early stage of non-small cell lung cancer, including 40 adenocarcinoma samples, 16 squamous cell carcinoma samples and 20 normal samples. In order to identify the specific markers for diagnosis, we compared the two subtypes with the normal samples respectively to determine the gene expression characteristics. Through the multi-dimensional scaling classification, we found that the samples were clustered well according to the disease cases. Based on the classification results and using empirical Bayes moderation and treat method, 486 important genes associated with the disease were identified. We constructed gene functions and gene pathways to verify our result and explain the pathogenicity factor and process. We generated a protein-protein interaction network based on the mutual interaction between the selected genes and found that the top thirteen hub genes were highly associated with lung cancer or some other cancers including five newly found genes through our method. The results of this study indicated that contrast on the gene expression between different subtypes and normal samples provides important information for the detection of non-small cell lung cancer and helps exploration of the disease pathogenesis.
Keyphrases
- gene expression
- bioinformatics analysis
- early stage
- squamous cell carcinoma
- genome wide
- dna methylation
- genome wide identification
- protein protein
- machine learning
- small molecule
- magnetic resonance
- magnetic resonance imaging
- deep learning
- healthcare
- copy number
- transcription factor
- genome wide analysis
- radiation therapy
- locally advanced
- social media
- cystic fibrosis
- sentinel lymph node
- staphylococcus aureus
- pseudomonas aeruginosa
- health information
- sensitive detection
- lymph node metastasis
- biofilm formation
- loop mediated isothermal amplification
- network analysis