Metagenomics, the study of genetic materials recovered directly from environmental samples without isolating and culturing organisms, has become one of the principal tools of “meta-omic” analysis. It can be used to explore the diversity, function, and ecology of whole microbial ecosystems. The broad field may also be referred to as environmental genomicsecogenomics or community genomics. While traditional microbiology and and genomics rely on cultivated clonal cultures, early environmental gene sequencing cloned specific marker genes (often the 16S/18S rRNA gene) to produce a profile of diversity in a natural sample. Such work revealed that the vast majority of microbial diversity had been missed by cultivation-based methods. Recent studies use either shotgun (WGS) or amplicon (16S/18S) sequencing to get largely unbiased samples from all the members of the sampled communities. Shotgun metagenomics (also known as quantitative metagenomics) is more expensive but with a much higher resolution. This course will cover all the steps from sampling to data analysis.

 R: this group will make extensive use of the R programming language. If you are not familiar with R, you are required to follow this free DataCamp class: https://www.datacamp.com/courses/free-introduction-to-r.


    • Definition of Metagenomics
    • Technologies: 16S rRNA motivation and WGS motivation
    • NGS data preprocessing (QC, assembly, alignment, abundance counting)
    • Whole community analysis
    • Assembly, Binning, Analytical Pipelines.
    • Exploratory data analysis, Clustering and Multiple testing
    • Differential abundance testing, sequencing depth, rarefaction curves
    • Longitudinal analyses
    • Annotation and Functional analysis
    • Analyzing a metagenome and assessing metagenomics sequence quality
    • Comparing metagenomes: Big data analytics for metagenomics

  • Thanh Le Viet
    Thanh obtained a B.E in Mapping and an MSc in Cartography, Remote Sensing, and Geographic Information System (GIS) from the Hanoi University of Mining and Geology. With a strong background in GIS, Thanh joined the Oxford University Clinical Research Unit (OUCRU), Hanoi in 2008 for making the Atlas of Human Infectious Diseases. He completed his PhD at the University of Montpelier 2 in 2015 on the study of dengue transmission in Hanoi using computational approaches such as spatial statistics, machine learning and phylo-geography. He currently works at OUCRU as a postdoctoral researcher and will be focusing on bioinformatics/machine learning to study antimicrobial resistance.
    Joseph Paulson
    Joseph Paulson is a statistical scientist at Genentech. He studied mathematics and completed a Ph.D. in Applied Mathematics, Statistics and Scientific Computation as a National Science Foundation Graduate Fellow at the University of Maryland, College Park in 2015. Under the guidance of Mihai Pop and Hector Corrada-Bravo he began developing computational methods for the analysis of high-throughput sequencing data and in particular metagenomics. Afterwards, he moved to the Department of Biostatistics and Computational Biology at the Dana Farber Cancer Institute and Department of Biostatistics at the Harvard School of Public Health under Professor John Quackenbush where he began thinking of large network based solutions to understanding host-pathogen interactions computationally. The core of his interests involves accounting for sequencing artifacts, like under-sampling, to get at robust biological and translational interpretations. He is committed to open-source software and has contributed to Bioconductor and popular metagenomic pipelines.
    Edi Prifti
    Edi Prifti graduated in biomedical informatics in 2007 and received his PhD in bioinformatics from the Pierre and Marie Curie University (Paris) in 2011. His research focused on integrative centrality measures in omics derived networks applied to complex diseases such as Obesity and diabetes. Since 2010 after he joined the INRA/MetaGenoPolis lab he focused extensively in developing methods and tools (MetaOMineR package suite) for the analysis of very large quantitative metagenomics data and applied them to multiple medical conditions (Obesity, Liver Cirrhosis, Diabetes, HIV, etc). In 2015 he joined the Institute of Cardiometabolic and Nutrition (ICAN) as a researcher and is at present the deputy director of the IntegrOmics department. He is particularly interested in exploring and understanding the microbial ecosystem that inhabits our guts and that is tightly associated with health and disease.
    Ari Ugarte
    Ari Ugarte graduated from the Monterrey Institute of Technology and Higher Education (Mexico) in 2006. He worked in the industry as a software engineer for 4 years learning the best practices in software production. He then moved to Paris, where he completed a Master degree in Bioinformatics and modeling (2012) and obtained his PhD in functional annotation of metagenomics data (2016) from the Pierre and Marie Curie University. He is currently a postdoctoral researcher at the Institute of Cardio-metabolism and Nutrition (ICAN) where he is exploiting big data methods in large scale analysis for the functional annotation of the human microbiome. His research focuses on the characterization and prediction of highly diverged protein sequences and the use of meta-learning strategies to improve the accuracy and precision of predictions. He is also interested in new algorithms for the abstraction and reduction of multi-dimensional data.
    Jean-Daniel Zucker
    Jean-Daniel Zucker graduated from the ENSAE National Higher School of Aeronautics and Space in 1995. He then graduated in artificial intelligence in 1992. He got his PhD in 1996 in Machine Learning from Paris 6 University where he became an associate professor focusing on Relational Machine Learning. In 2002, he became Full Professor of Computer Science at Paris 13 University where he started a laboratory on Medical Informatics and Bioinformatics (LIM&BIO) in which he was heading a team on Prediction Analysis for Transcriptomics Data. In 2008 he became a Senior Researcher at the national institute of Research for development (IRD) on the themes of Data Mining and Decentralized AI for Complex Systems modeling. He is now the director of the Mathematical and Computer Modeling of Complex Systems Laboratory UMMISCO (IRD & University Paris 6) that counts 67 permanent staff in France, Vietnam, Morocco, Senegal and Cameroun. He is also heading the Bioinformatics department called INTEGROMICS of the ICAN institute of cardiometabolism and nutrition. His research is focused on AI in finding approaches for the automatic construction of predictive models (supervised learning) or characteristic model (unsupervised learning or "clustering"). His main field of application is today Metagenomics of the gut microbiota and contributed to several European Networks in genetics and functional genomics (Diogenes, METAHIT, METACARDIS,...). His research is developed through International collaboration with Vietnam, China, Taiwan, USA, Italy. He has been posted in Vietnam for 5 years (2011-2015).