Pitfalls of using sequence databases for heterologous expression studies - a technical review.
Stephan MaxeinerGabriela Krasteva-ChristMike AlthausPublished in: The Journal of physiology (2023)
Synthesis of DNA fragments based on gene sequences that are available in public resources has become an efficient and affordable method that has gradually replaced traditional cloning efforts such as PCR cloning from cDNA. However, database entries based on genome sequencing results are prone to errors which can lead to false sequence information and, ultimately, errors in functional characterisation of proteins such as ion channels and transporters in heterologous expression systems. We have identified five common problems that repeatedly appear in public resources: (1) Not every gene has yet been annotated; (2) not all gene annotations are necessarily correct; (3) transcripts may contain automated corrections; (4) there are mismatches between gene, mRNA and protein sequences; and (5) splicing patterns often lack experimental validation. This technical review highlights and provides a strategy to bypass these issues in order to avoid critical mistakes that could impact future studies of any gene/protein of interest in heterologous expression systems.
Keyphrases
- genome wide
- copy number
- binding protein
- genome wide identification
- mental health
- healthcare
- adverse drug
- patient safety
- dna methylation
- high throughput
- gene expression
- protein protein
- saccharomyces cerevisiae
- deep learning
- transcription factor
- quality improvement
- single molecule
- long non coding rna
- small molecule
- single cell
- social media
- electronic health record