Beginner
Next-Generation Sequencing (NGS) Analysis
Master the complete Next-Generation Sequencing (NGS) workflow. This course provides hands-on experience in a Linux environment (WSL), covering everything from raw data Quality Control to Genome Assembly and Sequence Alignment.
- Environment: Linux on Windows (WSL) setup and command-line mastery.
- Core Workflows: QC, Trimming, Assembly (SPAdes), and Alignment (BWA/SAMtools).
- Data Analysis: Working with FASTA, FASTQ, BAM, and VCF formats.
Course Overview
This section provides a comprehensive introduction to NGS data analysis, covering essential computational skills, biological data handling, and practical workflows. Participants will gain hands-on experience using Linux-based environments and widely used bioinformatics tools for real-world genomic data analysis.
What You Will Learn
Module 0 & 1: Linux Environment & NGS Fundamentals
- Linux on Windows: Setting up WSL and understanding why Linux is vital for bioinformatics.
- Command Line: Practical navigation, file system structure, and essential shell commands.
- Workflow Design: Overview of end-to-end NGS analysis and setting up mini-projects.
Module 2 & 3: Data Retrieval & Similarity Search
- NCBI Databases: Accessing real genomic data and reference genomes (e.g., E. coli).
- File Formats: Deep dive into FASTA, FASTQ, BAM, and VCF structures.
- BLAST Analysis: Executing local vs. web-based BLAST and interpreting output results.
Module 4: Quality Control (QC)
- Raw Data Assessment: Understanding Phred quality scores and FastQC reports.
- Data Cleaning: Hands-on read trimming and filtering techniques to improve data quality.
Module 5 & 6: Genome Assembly & Alignment
- Genome Assembly: Building complete genomes from reads using SPAdes and validating results.
- Sequence Alignment: Aligning reads to reference genomes using BWA and SAMtools.
- Metrics: Evaluating alignment quality and understanding SAM/BAM formats.
Module 7: Variant Calling (Conceptual)
- Genetic Variation: Principles of identifying variants from sequencing data.
- Pipelines: Understanding the logic of industry-standard variant calling workflows and VCF interpretation.
Learning Outcomes
- Work efficiently in a Linux-based bioinformatics environment.
- Handle real-world NGS datasets independently.
- Perform QC, Assembly, and Alignment using professional tools.
- Translate raw sequencing data into meaningful biological insights.