< Home

Pipeline Results

End-to-end metrics from the WeCanVax neoantigen discovery pipeline, covering somatic variant profiling (Module 1), RNA-seq expression validation (Module 2), MHC/DLA Genotyping (Module 3), MHC Binding Prediction (Module 4), and Vaccine Cassette Design (Module 5).

Candidate Discovery Funnel

The pipeline reduces thousands of raw sequencing artifacts down to a handful of highly immunogenic, tumor-specific target epitopes. Crucially, the 27 protein-altering mutations are evaluated in parallel: they are checked for RNA expression, and simultaneously all mutated regions are translated and subjected to 20,796 MHC binding predictions. The intersection of these two datasets (preferring highly expressed alleles but allowing fallbacks) yields the optimal vaccine targets.

1. Variant Discovery

  • Raw Variants9,876
  • High Confidence356
  • Protein Altering27

Mutect2 unfiltered calls down to coding mutations only.

2. Parallel Evaluation

  • Tumor Expressed21
  • Total Predictions20,796
  • MHC Binders394

RNA validated vs MHC binding predictions derived from the 27 mutations.

3. Final Selection

  • Validated Binders90
  • Vaccine Targets15
  • Final Cassette1 DNA Construct

Intersection yielding 10 Class I and 5 Class II targets into a 765bp construct.

Module 1 — Somatic Mutation Profiling

QC & Trimming

MetricNormal SRR7780918Tumor SRR7780919
Input reads (PE) Target: ≥ 100M (Fresh) / ≥ 300M (FFPE)38.5M × 290.0M × 2
Reads passed Target: ≥ 90% of Input73.2M172.1M
Low quality removed Target: < 5%3.7M7.6M
Adapter/Short/N Target: < 5%59.1K278.9K
Q30 bases Target: ≥ 80%92.35%91.71%
GC Content Target: 40 - 55%48.02%48.01%

Alignment

MetricNormal SRR7780918Tumor SRR7780919
Total reads73.4M172.4M
Target Coverage (est.) Target: ≥ 100x (N) / ≥ 300x (T)~127x~303x
Mapping rate Target: ≥ 95%99.72%99.78%
Properly paired Target: ≥ 90%98.96%99.19%
Duplicates Target: ≤ 20%5.04M (13.8%)10.5M (12.3%)
Singletons Target: < 1%0.09%0.06%

Variant Calling & Annotation Process

Raw Unfiltered9,876
Mutect2
PASS Variants356
FilterCalls
Annotated338
VEP
Coding27

Top Coding Variants

Gene Location Mutation
(protein)
ConsequenceNormal VAF
(AD/DP)
Tumor VAF
(AD/DP)
-18:55018082
A > T
intron variant10.0%0/845.7%11/24
-4:17146493
T > C
p.Lys20Arg
missense variant6.5%0/1042.9%7/16
-4:17146494
T > TGGAAGGT... > TGGAAGGTGGGCAGGGGGCACAGAAAGGGGGAAGGCGTCAGGTCGCCCGGGGCGCTCGCCCTGGACC
p.Gly19_Lys20insGlyProGlyArgAlaProArgAlaThrTer
stop gained&inframe insertion6.5%0/1042.9%7/16
OR6C3I3:31291411
C > G
p.Gln294Glu
missense variant6.6%0/1236.4%7/20
-26:27309237
C > T
p.Thr15Ile
missense variant&splice region variant9.1%0/834.5%13/37

Module 2 — RNA-seq Expression Validation

STAR Alignment & Expression

Alignment Stats

  • Total read pairs33.7M
  • Uniquely mapped93.38%
  • Multi-mapped3.62%
  • Overall mapped97.00%

STAR two-pass alignment with TranscriptomeSAM for RSEM.

Expression Quantification

  • ToolRSEM
  • Transcripts quantified60,994
  • EM convergence6,122 rounds
  • Top gene TPM42,740

RSEM gene-level TPM used for expression filtering.

Variant Validation

RNA Validation Summary

  • Total PASS variants356
  • VALIDATED (alt≥2, VAF≥5%)21
  • WEAK_SUPPORT10
  • NOT_EXPRESSED30
  • NO_COVERAGE295

bam-readcount for SNVs, bcftools mpileup for indels.

Expression Integration

  • VALIDATED + TPM ≥ 116
  • VALIDATED + TPM < 15
  • Strong candidates16

Strong candidates: RNA-validated with sufficient gene expression.

Validation Rate

  • RNA validated / PASS5.9%
  • With expression / validated76.2%
  • NO_COVERAGE rate82.9%

High NO_COVERAGE is expected — most variants fall in regions not captured by RNA-seq.

Module 3 — MHC/DLA Genotyping

Accurate neoantigen prediction requires knowing the exact MHC (Dog Leukocyte Antigen, DLA) alleles of the patient. The pipeline uses both WES (for comprehensive mapping) and RNA-seq (to confirm active expression) to resolve the highly polymorphic DLA loci. The table below highlights the confirmed and actively expressed alleles essential for accurate Class I (CD8+) and Class II (CD4+) epitope prediction.

Expressed DLA Genotypes

LocusConfirmed AlleleCoverage
(breadth)
Mean Depth
(WES)
RNA Expression
(TPM)
Status
Class I (CD8+ Targets)
DLA-88DLA-88*028:0156.0%1.05x85,683Expressed
DLA-12DLA-12*001:01:0165.0%6,640.7x298,224Expressed
DLA-12DLA-12*001:04:0174.5%1.11x5,426Expressed
DLA-12DLA-12*005:0155.5%0.58x527Low Expr
DLA-64DLA-64*001:01:0198.8%11.5x63,887Expressed
DLA-64DLA-64*001:0299.3%10.0x8,534Expressed
DLA-64DLA-64*001:05:0192.7%8.26x3,151Expressed
DLA-79DLA-79*001:05100%111.7x10,483Expressed
DLA-79DLA-79*001:02100%8.55x2,837Expressed
Class II (CD4+ Targets)
DLA-DRB1DLA-DRB1*006:01100%9.54x142,371Expressed
DLA-DRB1DLA-DRB1*015:02100%6.83x137,877Expressed
DLA-DQA1DLA-DQA1*005:01:1100%6.64x11,295Expressed
DLA-DQA1DLA-DQA1*006:0179.6%0.79x14,145Expressed
DLA-DQB1DLA-DQB1*023:0198.5%5.28x44,235Expressed
DLA-DQB1DLA-DQB1*007:01100%9.94x32,078Expressed

Module 4 — Neoantigen Prediction & Ranking

MHC Binding Predictions

Class I (netMHCpan-4.2)

  • Binding candidates222
  • Strong binders (EL ≥ 0.5)0
  • Weak binders (EL 0.1-0.5)222
  • RNA Validated binders34

Predicted binding to patient DLA Class I alleles (8-11mer peptides).

Class II (netMHCIIpan-4.2)

  • Binding candidates172
  • Strong binders (EL ≥ 0.5)1
  • Weak/Marginal binders171
  • RNA Validated binders56

Predicted binding to patient DLA Class II alleles (15-mer peptides).

Top Ranked Neoantigen Candidates

Source Mutation
(Gene & AA Change)
DLA AlleleEpitope SequenceBinding Affinity
(EL Score)
DAI ScoreValidation
Class I (CD8+ Targets)
SETX (D2395A)DLA-88*028:01ATVDGFQGR0.230.19VALIDATED
- (T15I)DLA-88*028:01HCIGSVASY0.230.14VALIDATED
- (W46R)DLA-79*001:05RRPLHTHTP0.130.08VALIDATED
CDC42EP1 (M49V)DLA-79*001:05GDFRHTVHV0.380.04VALIDATED
CDC42EP1 (M49V)DLA-64*001:05:01GDFRHTVHV0.380.03VALIDATED
Class II (CD4+ Targets)
CDC42EP1 (M49V)DLA-DRB1*006:01LGDFRHTVHVGRGGD0.330.13VALIDATED
CDC42EP1 (M49V)DLA-DRB1*006:01PLGDFRHTVHVGRGG0.260.12VALIDATED
SETX (D2395A)DLA-DRB1*015:02PAEVATVDGFQGRQK0.260.11VALIDATED
CDC42EP1 (M49V)DLA-DRB1*006:01GDFRHTVHVGRGGDV0.220.11VALIDATED
- (W46R)DLA-DRB1*006:01RRPLHTHTPSLSATR0.280.11VALIDATED
SETX (D2395A)DLA-DQA1*006:01/DQB1*023:01GPAEVATVDGFQGRQ0.120.07VALIDATED

Module 5 — Vaccine Cassette Design & Assembly

Epitope Selection & Cassette

Selected Epitopes

  • Class I (CD8+)10 epitopes
  • Class II (CD4+)5 epitopes
  • All RNA-VALIDATED
  • Source genesPIK3CA, CDC42EP1, GLI3, SETX

Top epitopes by EL_SCORE from Module 4, deduplicated by peptide sequence.

Cassette Protein

  • Total length254 aa
  • Molecular weight26,709 Da
  • Instability index25.6 (Stable)
  • GRAVY−0.19 (Hydrophilic)

Instability < 40 indicates stability. GRAVY < 0 indicates good solubility.

Architecture

  • Signal peptidetPA (23 aa)
  • Class I linkerAAY (proteasome)
  • Class II linkerGPGPG (flexible)
  • Helper epitopePADRE (13 aa)

Multi-epitope design with proteasomal cleavage sites and universal CD4+ helper.

DNA Construct & Optimization

Codon Optimization

  • DNA length765 bp
  • GC content75.0%
  • T(U) content11.2%
  • Stop codonTGA ✓

GC-max / U-min strategy for mRNA stability and reduced innate immune activation.

3D Structure (ESMFold)

  • mean pLDDT0.29
  • InterpretationExpected
  • GPURTX 3090
  • Inference time5.5 sec

Low pLDDT is normal for synthetic multi-epitope vaccine constructs.

Quality Verification

  • Round-trip translation✓ Verified
  • Isoelectric point7.76 (Neutral)
  • Codon tableCanine (GC-max)
  • Aromaticity0.102

DNA→Protein round-trip confirms zero translation errors.

Top Epitope Candidates (Class I & II Highlights)

Highlighting the top MHC Class I & II binding predictions for missense variants that are actively expressed (VALIDATED) in the tumor. A positive Differential Agretopicity Index (DAI) indicates the mutant peptide binds stronger than its wildtype counterpart.

GeneAlleleClassChangePeptideMutant ELDAI
PIK3CAHigh DAIDRB1_006_01IIH1047RQEALEYFMKQMNDAR0.421+0.0306
CDC42EP1High DAI64_001_05_01IM49VGDFRHTVHV0.380+0.0279
CDC42EP1High DAI79_001_05IM49VGDFRHTVHV0.380+0.0435
CDC42EP1High DAIDRB1_006_01IIM49VLGDFRHTVHVGRGGD0.334+0.1284
GLI388_028_01IV1096MFLPDDMVQY0.321-0.0470
GLI3DRB1_006_01IIV1096MPDDMVQYLNSQNPAG0.303-0.0020
SETX79_001_05ID2395AAEVATVDGF0.299-0.1229
SETXHigh DAIDRB1_015_02IID2395APAEVATVDGFQGRQK0.261+0.1147

Highlighted rows indicate High DAI — these epitopes are significantly more foreign than their wildtype counterparts, making them prime vaccine targets.