OmniEdge Scientific
Self-paced
Beginner to Advanced

AI & Machine Learning for Biologists

AI & Machine Learning for Biologists

Classical ML • Deep Learning • Protein AI • Drug Discovery • XAI. Practical AI and machine learning skills reshaping modern life sciences — from scikit-learn classifiers to CNNs/LSTMs, transformer-based protein language models (ESM-2, AlphaFold2), graph neural networks for molecules, and explainable AI for biomedical research.

Program Overview

This curriculum equips biologists with the practical AI and machine-learning skills reshaping modern life sciences. Starting from zero programming assumption and building progressively through classical machine learning, deep learning, large language models for proteins, generative AI, and explainability, every concept is anchored to a real biological question:

  • Can we predict cancer from expression data?
  • Can AI fold a protein we have never seen before?
  • Can we design new drug molecules computationally?

Foundations of AI & ML for Biology

  • What is AI/ML? A Biologist's Map — AI vs ML vs Deep Learning, supervised / unsupervised / reinforcement learning with biological examples.
  • Data Preparation & Feature Engineering — Pandas, missing-value handling, StandardScaler/MinMaxScaler, one-hot encoding of amino acids/nucleotides.
  • Exploratory Data Analysis — correlation heatmaps, PCA scatter plots, violin/box plots with seaborn and matplotlib.
  • Supervised Learning — Classification — logistic regression, KNN, accuracy/precision/recall/F1, confusion matrices.
  • Supervised Learning — Regression & Trees — linear/ridge regression, decision trees, random forest, feature importance for biomarker discovery.
  • Unsupervised Learning — K-Means, hierarchical clustering, UMAP, t-SNE, silhouette score.
  • Model Evaluation & TuningGridSearchCV, ROC-AUC, scikit-learn Pipelines, joblib for reproducibility.

Deep Learning for Biological Sequences & Images

  • Neural Network Foundations — perceptron, ReLU/sigmoid/softmax, MLPs in Keras, Adam, binary cross-entropy.
  • Sequence Encoding & Embeddings — one-hot DNA/protein, k-mer vectors, ProtVec-style embeddings, Keras Embedding layers.
  • Convolutional Neural Networks — 1D CNNs for motif/splice-site detection, 2D CNNs for microscopy, transfer learning.
  • RNNs & LSTMs — LSTM/GRU/Bidirectional layers, secondary structure prediction (helix/sheet/coil), temporal expression.
  • Transformers & Protein Language Models — attention mechanism, ESM-2, ProtTrans, ESMFold, HuggingFace transformers.
  • AlphaFold2 & Structure Prediction — ColabFold, pLDDT and PAE interpretation, py3Dmol/nglview visualisation.
  • Generative Models — VAEs & GANs — latent-space drug-molecule generation, SMILES, RDKit, DeepChem.

Applied AI in Bioinformatics & Drug Discovery

  • Graph Neural Networks for Molecules — atoms-as-nodes/bonds-as-edges, MPNNs, PyTorch Geometric, DGL-LifeSci. Project: ESOL solubility prediction.
  • AI for Drug Discovery & Virtual Screening — QSAR, DeepChem ADMET, docking-score prediction, active learning.
  • Deep Learning for Genomics — Enformer-style expression prediction, DeepSEA/Basset, saliency maps and in-silico mutagenesis.
  • AI for Medical Imaging & Pathology — ResNet50/EfficientNet transfer learning, augmentation, Grad-CAM explainability.
  • Single-Cell AI — scVI, scANVI auto cell-type annotation, SCGAN, trajectory inference (PAGA / Monocle3).
  • Explainable AI in Biology — SHAP, LIME, attention visualisation, biological validation of ML-identified biomarkers.
  • Capstone — End-to-End AI Biology Pipeline — data → preprocessing → training → evaluation → interpretation, with model card and reproducibility.

Tools You Will Master

Python, Pandas, scikit-learn, TensorFlow/Keras, PyTorch, HuggingFace Transformers, fair-esm (ESM-2), ColabFold, RDKit, DeepChem, PyTorch Geometric, scvi-tools, Scanpy, SHAP, LIME, Captum, MLflow.

Lessons
No lessons published yet.