Pipeline Results
End-to-end metrics from the WeCanVax neoantigen discovery pipeline, covering somatic variant profiling (Module 1), RNA-seq expression validation (Module 2), MHC/DLA Genotyping (Module 3), MHC Binding Prediction (Module 4), and Vaccine Cassette Design (Module 5).
Candidate Discovery Funnel
The pipeline reduces thousands of raw sequencing artifacts down to a handful of highly immunogenic, tumor-specific target epitopes. Crucially, the 27 protein-altering mutations are evaluated in parallel: they are checked for RNA expression, and simultaneously all mutated regions are translated and subjected to 20,796 MHC binding predictions. The intersection of these two datasets (preferring highly expressed alleles but allowing fallbacks) yields the optimal vaccine targets.
1. Variant Discovery
- Raw Variants9,876
- High Confidence356
- Protein Altering27
Mutect2 unfiltered calls down to coding mutations only.
2. Parallel Evaluation
- Tumor Expressed21
- Total Predictions20,796
- MHC Binders394
RNA validated vs MHC binding predictions derived from the 27 mutations.
3. Final Selection
- Validated Binders90
- Vaccine Targets15
- Final Cassette1 DNA Construct
Intersection yielding 10 Class I and 5 Class II targets into a 765bp construct.
Module 1 — Somatic Mutation Profiling
QC & Trimming
| Metric | Normal SRR7780918 | Tumor SRR7780919 |
|---|---|---|
| Input reads (PE) Target: ≥ 100M (Fresh) / ≥ 300M (FFPE) | 38.5M × 2 | 90.0M × 2 |
| Reads passed Target: ≥ 90% of Input | 73.2M | 172.1M |
| Low quality removed Target: < 5% | 3.7M | 7.6M |
| Adapter/Short/N Target: < 5% | 59.1K | 278.9K |
| Q30 bases Target: ≥ 80% | 92.35% | 91.71% |
| GC Content Target: 40 - 55% | 48.02% | 48.01% |
Alignment
| Metric | Normal SRR7780918 | Tumor SRR7780919 |
|---|---|---|
| Total reads | 73.4M | 172.4M |
| Target Coverage (est.) Target: ≥ 100x (N) / ≥ 300x (T) | ~127x | ~303x |
| Mapping rate Target: ≥ 95% | 99.72% | 99.78% |
| Properly paired Target: ≥ 90% | 98.96% | 99.19% |
| Duplicates Target: ≤ 20% | 5.04M (13.8%) | 10.5M (12.3%) |
| Singletons Target: < 1% | 0.09% | 0.06% |
Variant Calling & Annotation Process
Top Coding Variants
| Gene ↕ | Location ↕ | Mutation (protein) | Consequence | Normal VAF ↕ (AD/DP) | Tumor VAF ↓ (AD/DP) |
|---|---|---|---|---|---|
| - | 18:55018082 | A > T | intron variant | 10.0%0/8 | 45.7%11/24 |
| - | 4:17146493 | T > C p.Lys20Arg | missense variant | 6.5%0/10 | 42.9%7/16 |
| - | 4:17146494 | T > TGGAAGGT... > TGGAAGGTGGGCAGGGGGCACAGAAAGGGGGAAGGCGTCAGGTCGCCCGGGGCGCTCGCCCTGGACC p.Gly19_Lys20insGlyProGlyArgAlaProArgAlaThrTer | stop gained&inframe insertion | 6.5%0/10 | 42.9%7/16 |
| OR6C3I | 3:31291411 | C > G p.Gln294Glu | missense variant | 6.6%0/12 | 36.4%7/20 |
| - | 26:27309237 | C > T p.Thr15Ile | missense variant&splice region variant | 9.1%0/8 | 34.5%13/37 |
Module 2 — RNA-seq Expression Validation
STAR Alignment & Expression
Alignment Stats
- Total read pairs33.7M
- Uniquely mapped93.38%
- Multi-mapped3.62%
- Overall mapped97.00%
STAR two-pass alignment with TranscriptomeSAM for RSEM.
Expression Quantification
- ToolRSEM
- Transcripts quantified60,994
- EM convergence6,122 rounds
- Top gene TPM42,740
RSEM gene-level TPM used for expression filtering.
Variant Validation
RNA Validation Summary
- Total PASS variants356
- VALIDATED (alt≥2, VAF≥5%)21
- WEAK_SUPPORT10
- NOT_EXPRESSED30
- NO_COVERAGE295
bam-readcount for SNVs, bcftools mpileup for indels.
Expression Integration
- VALIDATED + TPM ≥ 116
- VALIDATED + TPM < 15
- Strong candidates16
Strong candidates: RNA-validated with sufficient gene expression.
Validation Rate
- RNA validated / PASS5.9%
- With expression / validated76.2%
- NO_COVERAGE rate82.9%
High NO_COVERAGE is expected — most variants fall in regions not captured by RNA-seq.
Module 3 — MHC/DLA Genotyping
Accurate neoantigen prediction requires knowing the exact MHC (Dog Leukocyte Antigen, DLA) alleles of the patient. The pipeline uses both WES (for comprehensive mapping) and RNA-seq (to confirm active expression) to resolve the highly polymorphic DLA loci. The table below highlights the confirmed and actively expressed alleles essential for accurate Class I (CD8+) and Class II (CD4+) epitope prediction.
Expressed DLA Genotypes
| Locus | Confirmed Allele | Coverage (breadth) | Mean Depth (WES) | RNA Expression (TPM) | Status |
|---|---|---|---|---|---|
| Class I (CD8+ Targets) | |||||
| DLA-88 | DLA-88*028:01 | 56.0% | 1.05x | 85,683 | Expressed |
| DLA-12 | DLA-12*001:01:01 | 65.0% | 6,640.7x | 298,224 | Expressed |
| DLA-12 | DLA-12*001:04:01 | 74.5% | 1.11x | 5,426 | Expressed |
| DLA-12 | DLA-12*005:01 | 55.5% | 0.58x | 527 | Low Expr |
| DLA-64 | DLA-64*001:01:01 | 98.8% | 11.5x | 63,887 | Expressed |
| DLA-64 | DLA-64*001:02 | 99.3% | 10.0x | 8,534 | Expressed |
| DLA-64 | DLA-64*001:05:01 | 92.7% | 8.26x | 3,151 | Expressed |
| DLA-79 | DLA-79*001:05 | 100% | 111.7x | 10,483 | Expressed |
| DLA-79 | DLA-79*001:02 | 100% | 8.55x | 2,837 | Expressed |
| Class II (CD4+ Targets) | |||||
| DLA-DRB1 | DLA-DRB1*006:01 | 100% | 9.54x | 142,371 | Expressed |
| DLA-DRB1 | DLA-DRB1*015:02 | 100% | 6.83x | 137,877 | Expressed |
| DLA-DQA1 | DLA-DQA1*005:01:1 | 100% | 6.64x | 11,295 | Expressed |
| DLA-DQA1 | DLA-DQA1*006:01 | 79.6% | 0.79x | 14,145 | Expressed |
| DLA-DQB1 | DLA-DQB1*023:01 | 98.5% | 5.28x | 44,235 | Expressed |
| DLA-DQB1 | DLA-DQB1*007:01 | 100% | 9.94x | 32,078 | Expressed |
Module 4 — Neoantigen Prediction & Ranking
MHC Binding Predictions
Class I (netMHCpan-4.2)
- Binding candidates222
- Strong binders (EL ≥ 0.5)0
- Weak binders (EL 0.1-0.5)222
- RNA Validated binders34
Predicted binding to patient DLA Class I alleles (8-11mer peptides).
Class II (netMHCIIpan-4.2)
- Binding candidates172
- Strong binders (EL ≥ 0.5)1
- Weak/Marginal binders171
- RNA Validated binders56
Predicted binding to patient DLA Class II alleles (15-mer peptides).
Top Ranked Neoantigen Candidates
| Source Mutation (Gene & AA Change) | DLA Allele | Epitope Sequence | Binding Affinity (EL Score) | DAI Score | Validation |
|---|---|---|---|---|---|
| Class I (CD8+ Targets) | |||||
| SETX (D2395A) | DLA-88*028:01 | ATVDGFQGR | 0.23 | 0.19 | VALIDATED |
| - (T15I) | DLA-88*028:01 | HCIGSVASY | 0.23 | 0.14 | VALIDATED |
| - (W46R) | DLA-79*001:05 | RRPLHTHTP | 0.13 | 0.08 | VALIDATED |
| CDC42EP1 (M49V) | DLA-79*001:05 | GDFRHTVHV | 0.38 | 0.04 | VALIDATED |
| CDC42EP1 (M49V) | DLA-64*001:05:01 | GDFRHTVHV | 0.38 | 0.03 | VALIDATED |
| Class II (CD4+ Targets) | |||||
| CDC42EP1 (M49V) | DLA-DRB1*006:01 | LGDFRHTVHVGRGGD | 0.33 | 0.13 | VALIDATED |
| CDC42EP1 (M49V) | DLA-DRB1*006:01 | PLGDFRHTVHVGRGG | 0.26 | 0.12 | VALIDATED |
| SETX (D2395A) | DLA-DRB1*015:02 | PAEVATVDGFQGRQK | 0.26 | 0.11 | VALIDATED |
| CDC42EP1 (M49V) | DLA-DRB1*006:01 | GDFRHTVHVGRGGDV | 0.22 | 0.11 | VALIDATED |
| - (W46R) | DLA-DRB1*006:01 | RRPLHTHTPSLSATR | 0.28 | 0.11 | VALIDATED |
| SETX (D2395A) | DLA-DQA1*006:01/DQB1*023:01 | GPAEVATVDGFQGRQ | 0.12 | 0.07 | VALIDATED |
Module 5 — Vaccine Cassette Design & Assembly
Epitope Selection & Cassette
Selected Epitopes
- Class I (CD8+)10 epitopes
- Class II (CD4+)5 epitopes
- All RNA-VALIDATED✓
- Source genesPIK3CA, CDC42EP1, GLI3, SETX
Top epitopes by EL_SCORE from Module 4, deduplicated by peptide sequence.
Cassette Protein
- Total length254 aa
- Molecular weight26,709 Da
- Instability index25.6 (Stable)
- GRAVY−0.19 (Hydrophilic)
Instability < 40 indicates stability. GRAVY < 0 indicates good solubility.
Architecture
- Signal peptidetPA (23 aa)
- Class I linkerAAY (proteasome)
- Class II linkerGPGPG (flexible)
- Helper epitopePADRE (13 aa)
Multi-epitope design with proteasomal cleavage sites and universal CD4+ helper.
DNA Construct & Optimization
Codon Optimization
- DNA length765 bp
- GC content75.0%
- T(U) content11.2%
- Stop codonTGA ✓
GC-max / U-min strategy for mRNA stability and reduced innate immune activation.
3D Structure (ESMFold)
- mean pLDDT0.29
- InterpretationExpected
- GPURTX 3090
- Inference time5.5 sec
Low pLDDT is normal for synthetic multi-epitope vaccine constructs.
Quality Verification
- Round-trip translation✓ Verified
- Isoelectric point7.76 (Neutral)
- Codon tableCanine (GC-max)
- Aromaticity0.102
DNA→Protein round-trip confirms zero translation errors.
Top Epitope Candidates (Class I & II Highlights)
Highlighting the top MHC Class I & II binding predictions for missense variants that are actively expressed (VALIDATED) in the tumor. A positive Differential Agretopicity Index (DAI) indicates the mutant peptide binds stronger than its wildtype counterpart.
| Gene | Allele | Class | Change | Peptide | Mutant EL | DAI |
|---|---|---|---|---|---|---|
| PIK3CAHigh DAI | DRB1_006_01 | II | H1047R | QEALEYFMKQMNDAR | 0.421 | +0.0306 |
| CDC42EP1High DAI | 64_001_05_01 | I | M49V | GDFRHTVHV | 0.380 | +0.0279 |
| CDC42EP1High DAI | 79_001_05 | I | M49V | GDFRHTVHV | 0.380 | +0.0435 |
| CDC42EP1High DAI | DRB1_006_01 | II | M49V | LGDFRHTVHVGRGGD | 0.334 | +0.1284 |
| GLI3 | 88_028_01 | I | V1096M | FLPDDMVQY | 0.321 | -0.0470 |
| GLI3 | DRB1_006_01 | II | V1096M | PDDMVQYLNSQNPAG | 0.303 | -0.0020 |
| SETX | 79_001_05 | I | D2395A | AEVATVDGF | 0.299 | -0.1229 |
| SETXHigh DAI | DRB1_015_02 | II | D2395A | PAEVATVDGFQGRQK | 0.261 | +0.1147 |
Highlighted rows indicate High DAI — these epitopes are significantly more foreign than their wildtype counterparts, making them prime vaccine targets.