Enhanced Data Mining and Visualization of Sensory-Graph-Modeled Datasets through Summarization.
Syed Jalaluddin HashmiBayan AlabdullahNaif Al MudawiAsaad AlgarniAhmad JalalHui LiuPublished in: Sensors (Basel, Switzerland) (2024)
The acquisition, processing, mining, and visualization of sensory data for knowledge discovery and decision support has recently been a popular area of research and exploration. Its usefulness is paramount because of its relationship to the continuous involvement in the improvement of healthcare and other related disciplines. As a result of this, a huge amount of data have been collected and analyzed. These data are made available for the research community in various shapes and formats; their representation and study in the form of graphs or networks is also an area of research which many scholars are focused on. However, the large size of such graph datasets poses challenges in data mining and visualization. For example, knowledge discovery from the Bio-Mouse-Gene dataset, which has over 43 thousand nodes and 14.5 million edges, is a non-trivial job. In this regard, summarizing the large graphs provided is a useful alternative. Graph summarization aims to provide the efficient analysis of such complex and large-sized data; hence, it is a beneficial approach. During summarization, all the nodes that have similar structural properties are merged together. In doing so, traditional methods often overlook the importance of personalizing the summary, which would be helpful in highlighting certain targeted nodes. Personalized or context-specific scenarios require a more tailored approach for accurately capturing distinct patterns and trends. Hence, the concept of personalized graph summarization aims to acquire a concise depiction of the graph, emphasizing connections that are closer in proximity to a specific set of given target nodes. In this paper, we present a faster algorithm for the personalized graph summarization (PGS) problem, named IPGS; this has been designed to facilitate enhanced and effective data mining and visualization of datasets from various domains, including biosensors. Our objective is to obtain a similar compression ratio as the one provided by the state-of-the-art PGS algorithm, but in a faster manner. To achieve this, we improve the execution time of the current state-of-the-art approach by using weighted, locality-sensitive hashing, through experiments on eight large publicly available datasets. The experiments demonstrate the effectiveness and scalability of IPGS while providing a similar compression ratio to the state-of-the-art approach. In this way, our research contributes to the study and analysis of sensory datasets through the perspective of graph summarization. We have also presented a detailed study on the Bio-Mouse-Gene dataset, which was conducted to investigate the effectiveness of graph summarization in the domain of biosensors.
Keyphrases
- healthcare
- electronic health record
- neural network
- convolutional neural network
- big data
- randomized controlled trial
- systematic review
- magnetic resonance
- machine learning
- rna seq
- sentinel lymph node
- data analysis
- early stage
- mental health
- high throughput
- copy number
- genome wide
- drug delivery
- transcription factor
- gene expression
- dna methylation
- social media
- artificial intelligence
- health information
- label free