当前位置: 学术报告 >> Big Data Analysis and Mining in Microbiome
Big Data Analysis and Mining in Microbiome
发布于:2017-06-01  作者:学术与教学资源中心



主讲人:Xiaohua Tony Hu 教授  德雷塞尔大学信息科学与技术学院全职教授,数据挖掘与生物信息实验室主任


We know little about themicrobial world. Microbiome sequencing (i.e. metagenome, 16s rRNA) extracts DNAdirectly from a microbial environment without culturing  any species.Recently, huge amount of data are generated from many micorbiome projects suchas Human Microbiome Project (HMP), Metagenomics of the Human Intestinal Tract(MetaHIT), et al. Analyzing these data will help us to better understand thefunction and structure of microbial community of human body, earth and otherenvironmental eco-systems. However, the huge data volume, the complexity ofmicrobial community and the intricate data properties have created a lot ofopportunities and challenges for data analysis and mining.  For example,it is estimate that in the microbial eco-system of human gut, there are about1000 kinds of bacteria with 10 billion bacteria and more than 4 million genesin more than 6000 orthologous gene family. The challenges are due to thecomplex properties of microbiome: large-scale, complicated, diversity,correlation, composition, hierarchy, incompleteness etc. Current microbiomesdata analysis methods seldom consider these data properties and often make someassumptions such as linear, Euclidean space, metric-space, continue data type,which conflict with the true data properties. For example, some similaritiesare non-metric because the prevalent existence of some species; and theinteractions among species and environment are complex in high order. Thus itis urgent to develop novel computational methods to overcome these assumptionsand consider the microbiome data properties in the analysis procedure. In thistalk, we will discuss some computational methods to analyze and visualizemicrobiome big data. Our studies are focusing on 1) novel machine learning andcomputational technologies for dimension reduction and visualization ofmicrobiome data based on non-Euclidean spaces (manifold learning) to discovernonlinear intrinsic features and patterns in these data to overcome the linearassumptions, 2) novel statistical methods for variable selection in microbiomedata by integrating group information among variables.

Xiaohua Tony Hu is a full professor and thefounding director of the data mining and bioinformatics lab at the College of Computing and Informatics. He isalso serving as the founding  Co-Director of the NSF Center (I/U CRC) onVisual and Decision Informatics (NSF CVDI), IEEEComputer Society Bioinformatics and Biomedicine Steering Committee Chair, andIEEE Computer Society Big Data Steering Committee Chair.  He joined DrexelUniversity in 2002.  He founded the InternationalJournal of Data Mining and Bioinformatics (SCI indexed) in 2006. Earlier, he worked as a researchscientist in the world-leading R&D centers such as Nortel Research Center,and Verizon Lab (the former GTE labs). In 2001, he founded the DMW Software inSilicon Valley, California. He has a lot of experience and expertise to convertoriginal ideas into research prototypes, and eventually into commercialproducts, many of his research ideas have been integrated into commercialproducts and applications in data mining fraud detection, database marketing.

Tony’s current research interests are in data/text/web mining, big data,bioinformatics, information retrieval and information extraction, socialnetwork analysis, healthcare informatics, rough set theory and application. Hehas published more than 270 peer-reviewed research papers in various journals,conferences and books He has obtained more than US$8.5 million researchgrants in the past 10 years as PI or Co-PI (PIs of 9 NSF grants). He hasgraduated 19 Ph.D. students from 2006 to 2017 and is currently supervising 9Ph.D. students.