Construction and characterization of a de novo draft genome of garden cress (Lepidium sativum L.).
Aysenur Soyturk PatatFatima SenBehic Selman ErdogduAli Tevfik UncuAyse Ozgur UncuPublished in: Functional & integrative genomics (2022)
Garden cress (Lepidium sativum L.) is a Brassicaceae crop recognized as a healthy vegetable and a medicinal plant. Lepidium is one of the largest genera in Brassicaceae, yet, the genus has not been a focus of extensive genomic research. In the present work, garden cress genome was sequenced using the long read high-fidelity sequencing technology. A de novo, draft genome assembly that spans 336.5 Mb was produced, corresponding to 88.6% of the estimated genome size and representing 90% of the evolutionarily expected orthologous gene content. Protein coding gene content was structurally predicted and functionally annotated, resulting in the identification of 25,668 putative genes. A total of 599 candidate disease resistance genes were identified by predicting resistance gene domains in gene structures, and 37 genes were detected as orthologs of heavy metal associated protein coding genes. In addition, 4289 genes were assigned as "transcription factor coding." Six different machine learning algorithms were trained and tested for their performance in classifying miRNA coding genomic sequences. Logistic regression proved the best performing trained algorithm, thus utilized for pre-miRNA coding loci identification in the assembly. Repetitive DNA analysis involved the characterization of transposable element and microsatellite contents. L. sativum chloroplast genome was also assembled and functionally annotated. Data produced in the present work is expected to constitute a foundation for genomic research in garden cress and contribute to genomics-assisted crop improvement and genome evolution studies in the Brassicaceae family.
Keyphrases
- genome wide
- copy number
- genome wide identification
- dna methylation
- machine learning
- transcription factor
- bioinformatics analysis
- heavy metals
- climate change
- single cell
- gene expression
- single molecule
- risk assessment
- high frequency
- electronic health record
- high resolution
- big data
- resistance training
- health risk
- amino acid
- circulating tumor cells
- sewage sludge