Skip to Main Content

Using signal processing and machine learning to study complex diseases

Texas A&M University faculty-researcher Xiaoning Qian is utilizing signal processing and machine learning tools to decipher which genes are critical to understand and predict disease progression so that biologists can use that information to develop new disease management practices.

Modern biological experiments provide a large amount of data. Big biomedical data involves data sets that are more complex than what the traditional data-processing software can handle. In this case, these diseases offer complex data sets that require appropriate mathematical models and analytic methods to understand.

How genetic differences and environmental stress change the living system is a question Qian is out to answer. In order to begin his analysis, biologists provide him with various affected gene data sets. From there, he and his students develop models and algorithms to analyze the data provided. The goal is to identify important genes and decode which genes are intertwined and which trigger the system response, for example in immune pathways.

“We need to help this gene identification procedure,” said Qian, assistant professor in the Department of Electrical and Computer Engineering at Texas A&M University. “We need to have the statistical methods and computational algorithms to look at the data, to analyze the data, and then try to identify the change specifically due to different genetic and environmental perturbations.”

Qian’s goal is to develop analytic methods leading to biologically meaningful messages that can be validated by other researchers. Ultimately this could lead to the development of a user-friendly software for biologists to easily get the information they need. Right now they are in the early stage of developing methods to effectively analyze genomic data. One way they are working to develop these methods is by incorporating Bayesian methods.

“The idea of Bayesian is that if you base your analysis on a limited number of data samples, there is lots of uncertainty,” Qian said. “You don’t want to ignore that uncertainty – you want to incorporate that uncertainty in your analysis to make sure your derived results are robust.”