CScape predicts the oncogenic status (disease-driver or neutral) of somatic point mutations
in the coding and non-coding regions of the cancer genome.
Enter a mutation or list of mutations (one per line) into the form below using the format
chromosome,position,reference,mutant (see Help for more details).
Mutations uploaded from a file should use the VCF format with a minimum of five columns
(chromosome, position, id, reference, mutant).
Note: if a VCF file is uploaded, any entries in the User Input field will be ignored.
July 2018: liftOver conversion of the database has now been performed
to provide predictions for GRCh38/hg38. Check the box below to access them.
Our software accepts comma-separated mutation data in the following format:
Chromosome
Position
Reference Base
Mutant Base
11,219046,A,C 11,224139,A,T 11,375885,G,T 11,408898,A,T 11,499190,G,C 11,551832,C,A 11,607532,C,T 11,773638,A,T 11,800755,C,A 11,828599,C,G 11,988551,G,C 11,1025084,C,G 11,1027680,C,A 17,46827903,A,G 17,79060569,A,G 18,756761,C,A 18,3879501,C,A 19,407408,G,T 19,407519,G,C 19,407627,G,A 19,757693,C,A 19,757792,G,T 19,812882,G,T 2,45966,C,A 20,9048655,A,G 20,9923941,A,G 20,18479366,A,G 20,53170414,T,C 3,48265219,A,G 3,52848428,A,G 3,66659209,A,G 3,184195375,A,G 7,193598,C,T 9,916799,C,T 9,3324019,A,T 9,5050791,G,T 9,5077554,C,T 9,6013277,T,A 9,6550908,C,A 9,6554763,C,A
Chromosome
Position
Identifier
Reference Base
Mutant Base
Predictions are given as p-values in the range [0, 1]: values above 0.5 are
predicted to be deleterious, while those below 0.5 are predicted to be neutral or benign.
P-values close to the extremes (0 or 1) are the highest-confidence predictions
that yield the highest accuracy.
We also apply cautious classification thresholds, defined as those thresholds
that yield the highest possible accuracy (see our paper for details).
These are reported using different thresholds for coding (0.89 or above)
and noncoding (0.70 or above) SNVs.
We use distinct predictors for positions either in coding regions (positions
within coding-sequence exons) and non-coding regions (positions in intergenic
regions, introns or non-coding genes).
In our paper we consider regions of interest in the cancer genome.
For coding regions these are listed in the file coding-regions.tab
as
Python query script (7.5KB)
cscape_coding.vcf.gz (669MB)
cscape_coding.vcf.gz.tbi (664KB)
cscape_noncoding.vcf.gz (48GB)
cscape_noncoding.vcf.gz.tbi (2.3MB)
Usage: cscape_query.py query-file [options] Predict the oncogenic potential of single nucleotide variants (SNVs). The query file must be a list of queries that use the following format: chromosome,position,reference,mutant Example: 1,69094,G,A 11,168961,T,A 18,119888,G,A Options: -h, --help show this help message and exit -c CDB CScape coding database [default: cscape_coding.bed.gz] -n NDB CScape noncoding database [default: cscape_noncoding.bed.gz] -o OUTPUT Output file [default: stdout] -v Verbose mode [default: False]
Training and test data used to develop CScape are provided below. Each file within the .zip archives has five (tab-delimited) columns:
Chromosome Position Reference Allele Label
training_data.zip (820KB)
cscape_coding_tests.zip (840KB)
cscape_noncoding_tests.zip (17MB)
In our paper:
we investigate the frequency count of single nucleotide variants driving common solid tumours. We further discuss predicted driver counts stratified by stage of disease and driver counts in non-coding regions of the cancer genome, in addition to driver-genes. The following file driver-genes.xlsx gives the full count of single nucleotide variant drivers (SNV-drivers) across 25 different types of cancer, as discussed in this paper.