Pitfalls of using sequence databases for heterologous expression studies - a technical review.
Stephan MaxeinerGabriela Krasteva-ChristMike AlthausPublished in: The Journal of physiology (2023)
Synthesis of DNA fragments based on gene sequences available in public resources has become an efficient and affordable method that gradually replaced traditional cloning efforts such as PCR cloning from cDNA. However, database entries based on genome sequencing results are prone to errors which can lead to false sequence information and, ultimately, errors in functional characterization of proteins such as ion channels and transporters in heterologous expression systems. We have identified five common problems that repeatedly appear in public resources: 1) Not every gene has yet been annotated; 2) Not all gene annotations are necessarily correct; 3) Transcripts may contain automated corrections; 4) There are mismatches between gene, mRNA, and protein sequences; and 5) Splicing patterns often lack experimental validation. This technical review highlights and provides a strategy to bypass these issues in order to avoid critical mistakes that could impact future studies of any gene/protein of interest in heterologous expression systems. Abstract figure legend Projects involving heterologous gene expression are often characterised by similar steps. Initially, database research (A) is necessary to retrieve information of full of partial sequences of a gene of interest. A multitude of genome assemblies are annotated and deposited in public databases or that are available for refined search options using individual sequence information. The search results need to be scrutinised and compared to already available information (B). Once the sequence has been determined, DNA synthesis (C) by PCR or commercial synthesis are necessary for further cloning procedures (D). Eventually, the DNA needs to be transfected (E) and expressed in, e.g., eukaryotic cells (F). Finally, the expression of the gene of interest needs to be documented and its function analysed (G). This article is protected by copyright. All rights reserved.
Keyphrases
- genome wide
- copy number
- poor prognosis
- gene expression
- mental health
- healthcare
- genome wide identification
- binding protein
- amino acid
- machine learning
- adverse drug
- deep learning
- long non coding rna
- health information
- cell death
- small molecule
- patient safety
- high throughput
- protein protein
- big data
- current status
- artificial intelligence
- transcription factor
- clinical evaluation
- cell cycle arrest