Research Interests:

My research interests are machine learning, statistical and numerical computation, as well as the design and theoretical analysis of statistical algorithms. I also have extensive large-scale data-analysis and statistical modeling experience, especially in text mining, natural language processing, web and image applications. My research publications can be found here.


Current Research Projects:

Several of my research projects are supported by various grants. If you are a prospective student or postdoc interested in these research topics, feel free to contact me. Positions might become available and will be filled on an ongoing basis. 

Statistical Learning Theory and Sparsity Analysis, supported by NSF,  NSA,  AFOSR. The themes of these projects involve theoretical studies of modern statistical machine learning algorithms in order to understand their behaviors in high dimension. We look for rigorous mathematical theories that can explain the effectiveness of optimization formulations in high dimension (an example is the benefit of group Lasso versus standard Lasso) and the theoretical insights can be used to motivate new formulations; we also try to propose and study new computational procedures (an example is to study various greedy algorithms and other local search procedures) that can solve complex optimization problems and demonstrate the effectiveness/efficiency of such procedures.

Spectral Methods for Learning Time Serious and Graphical Models, supported by NSF. This project studies a new class of machine learning algorithms for graphical models, based on matrix factorization techniques. It employs ideas from control theory, machine learning, and Bayesian statistics. An example of this research is the proposal and analysis of a method that can learn a hidden Markov model effectively while avoiding the local minima problem in the traditional EM algorithm.

Machine Learning Methods for Big Data Analytics, supported by NSF. This project studies methods for scalable data analysis, including sampling methods, dimensionality reduction techniques as well as large scale optimization.

Machine Learning in Biological Array Analysis and Influenza Prediction, supported by NIH. The goal of this project is to apply machine learning to analyze biological arrays, and specifically we try to build mathematical models that can accurately predict the evolutional trend of influenza viruses from DNA sequences as well as biological array data. The outcome can be used for more reliable flu vaccine selection, outbreak monitoring and prevention.


[Tong Zhang's Homepage]