Data Levels and Data Types
The table below shows the relationship of TCGA data types to data levels as well as information on important metadata.
Please see the TCGA Data Primer for a detailed guide on TCGA data types.
*Red indicates data that are in the controlled-access data tier. Data in this tier require user authorization in order to access them. Please see the Access Tiers page for more information.
Please see the TCGA Data Primer for a detailed guide on TCGA data types.
*Red indicates data that are in the controlled-access data tier. Data in this tier require user authorization in order to access them. Please see the Access Tiers page for more information.
Relationship of Data Levels to Data Types
Data Type | Data Subtypes | Technology Platform(s) | Level 1 | Level 2 | Level 3 | Important Metadata |
---|---|---|---|---|---|---|
Clinical Data | n/a | Clinical information for each participant (including demographic information, treatment information, survival data, etc) File types: tab-delimited "biotab" (.txt) and .xml | n/a | n/a | The BCR data dictionarydescribes of all the clinical and biospecimen data elements in TCGA | |
n/a | Information on how samples from each participant were processed by the Biospecimen Core Resource Center (BCR) File types: tab-delimited "biotab" (.txt) and .xml | n/a | n/a | The BCR data dictionarydescribes of all the clinical and biospecimen data elements in TCGA | ||
Tissue Slide Images | 1. Diagnostic image | n/a | Tissue images used by the hospital to diagnose participant File type: .svs (image viewer) | n/a | n/a | Available images are listed in thebiospecimen biotab and xml files |
2. Tissue image | n/a | Images of tissue samples from each participant that were used for TCGA analyses File type: .svs (image viewer) | n/a | n/a | Available images are listed in thebiospecimen biotab and xml files | |
Pathology Reports | n/a | Pathology reports for a subset of participants File type: .pdf | n/a | n/a | n/a | |
Microsatellite Instability (MSI) | n/a | Markers indicating presence or absence of a MSI shift, allele homozygosity/heterozygosity, and loss of heterozygosity (LOH) observed in the tumor sample for each participant File types: fragment analysis trace file (.fsa) and tab-delimited (.txt) file summarizing the trace file | n/a | Classifications of microsatellite instability detected for each participant's tumor sample File type: auxiliary.xml | Level 1 data are submitted as part of a standardMAGE-TABarchive Level 3 data are contained in the BCR clinical data archives | |
DNA Sequencing | 1. Whole exome sequence (available at the Cancer Genomics Hub) | IluminaGA_DNASeq SOLiD_DNASeq | Whole exome sequence for both tumor and normal sample for each participant File type: binary alignment file (.bam) | n/a | n/a | Experimental protocol, including primer information, is contained in the metadata.xml fileassociated with each .bam file |
2. Whole genome sequence (available at the Cancer Genomics Hub) | IluminaGA_DNASeq SOLiD_DNASeq | Whole genome sequence for both tumor and normal sample for select participants File type: binary alignment file (.bam) | n/a | n/a | Experimental protocol, including primer information, is contained in the metadata.xml fileassociated with each .bam file | |
3. Sequence traces (may be available at the NCBI Trace Archive)1 | n/a | Raw sequence output from older sequencing technologies File type: sequence chromatogram format (.scf) | n/a | n/a | Trace-sample relationship (.tr) files map NCBI trace IDs to TCGA biospecimen barcodes | |
4. Mutations | IluminaGA_DNASeq SOLiD_DNASeq | Whole genome and exome sequence - see above | Validated and unvalidated DNA variant/ mutations for each participant File types: mutation annotation file (.maf) and variant calling file (.vcf) | Validated DNA variants/mutations for each participant File type: mutation annotation file (.maf) | TheDESCRIPTIONfile contains a summary of the mutation detection and validation method The .maf files do not have an standard MAGE-TAB archive associated with them | |
Expression - miRNA Sequencing | 1. miRNA sequence (available at theCancer Genomics Hub) | IlluminaGA_miRNASeq IlluminaHiSeq_miRNASeq | miRNA sequence for each participant's tumor sample File type: binary alignment file (.bam) | n/a | n/a | Experimental protocol, including primer information, is contained in the metadata.xml fileassociated with each .bam file |
2. miRNA | IlluminaGA_miRNASeq IlluminaHiSeq_miRNASeq | miRNA sequence for each participant's tumor sample - see above | n/a | The calculated expression for all reads aligning to a particular miRNA, per sample File type: tab-delimited (.txt) | Experimental protocol, including calculation methods, is included in the DESCRIPTION file of theMAGE-TABarchive | |
3. Isoform | IlluminaGA_miRNASeq IlluminaHiSeq_miRNASeq | miRNA sequence for each participant's tumor sample - see above | n/a | The calculated expression for each individual miRNA sequence isoform observed, per sample File type: tab-delimited (.txt) | Experimental protocol, including calculation methods, is included in the DESCRIPTION file of theMAGE-TABarchive | |
Expression - Protein Array | MDA_RPPA_Core | High resolution images of protein array slides (up to 1000 samples per slide) and raw signals per slide File types: .tiff (image viewer) for images and tab-delimited (.txt) for signals | Dilution curves for each sample File type: tab-delimited (.txt) | Normalized protein expression for each gene File type: tab-delimited (.txt) | Array design files, antibody annotations, and the experimental protocol are included inMAGE-TABarchive | |
Expression - mRNA | 1. mRNA sequence (available at theCancer Genomics Hub) | IlluminaGA_RNASeq IlluminaHiSeq_RNASeq | mRNA sequence for each participant's tumor sample File type: binary alignment file (.bam) | n/a | n/a | Experimental protocol, including primer information, is contained in the metadata.xml fileassociated with each .bam file |
2. Exon | IlluminaGA_RNASeq IlluminaHiSeq_RNASeq | mRNA sequence for each participant's tumor sample - see above | n/a | The calculated expression signal of a particular composite exon of a gene File type: tab-delimited (.txt) | Experimental protocol, including calculation methods, is included in the DESCRIPTION file of theMAGE-TABarchive | |
3. Gene | IlluminaGA_RNASeq IlluminaHiSeq_RNASeq | mRNA sequence for each participant's tumor sample - see above | n/a | The calculated expression signal of a gene File type: tab-delimited (.txt) | Experimental protocol, including calculation methods, is included in the DESCRIPTION file of theMAGE-TABarchive | |
4. Splice Junction | IlluminaGA_RNASeq IlluminaHiSeq_RNASeq | mRNA sequence for each participant's tumor sample - see above | n/a | The calculated expression signal of a particular composite splice junction of a gene File type: tab-delimited (.txt) | Experimental protocol, including calculation methods, is included in the DESCRIPTION file of theMAGE-TABarchive | |
5. Isoform | IlluminaGA_RNASeq IlluminaHiSeq_RNASeq | mRNA sequence for each participant's tumor sample - see above | n/a | The normalized expression signal of individual isoforms (transcripts) File type: tab-delimited (.txt) | Experimental protocol, including calculation methods, is included in the DESCRIPTION file of theMAGE-TABarchive | |
1. Gene | AgilentG4502A_07_3 AgilentG4502A_07_2 AgilentG4502A_07_1 HT-HG-U133A HG-U133-Plus2 | Raw signals per probe File types: binary (.CEL) and tab-delimited (.txt) | Normalized signals per probe or probe set File type: tab-delimited (.txt) | Expression calls for genes, per sample File type: tab-delimited (.txt) | Experimental protocol, including calculation methods, is included in theMAGE-TABarchive Probe information is contained in the Array design files for each platform | |
2. Exon | HuEx-1_0-st-v2 | Raw signals per probe File type: binary (.CEL) | Normalized signals per probe or probe set File type: .tab-delimited (.txt) | Expression calls for exons/variants, per sample File type: .tab-delimited (.txt) | Experimental protocol, including calculation methods, is included in theMAGE-TABarchive Probe information is contained in the Array design files for each platform | |
3. miRNA | H-miRNA_8x15K H-miRNA_8x15Kv2 | Raw signals per probe File type: tab-delimited (.txt) | Normalized signals per probe or probe set File type: tab-delimited (.txt) | Expression calls for miRNAs, per sample File type: tab-delimited (.txt) | Experimental protocol, including calculation methods, is included in theMAGE-TABarchive Probe information is contained in the Array design files for each platform | |
DNA Methylation | HumanMethylation27 HumanMethylation450 IlluminaDNAMethylation_ OMA002_CPI IlluminaDNAMethylation_ OMA003_CPI | Raw signal intensities of probes File type: tab-delimited (.txt) and binary [(.idat) (applies to HumanMethylation450 platform)] | Calculated beta values File type: tab-delimited (.txt) | Calculated beta values mapped to genome, per sample File type: tab-delimited (.txt) | Experimental protocol, including calculation methods, is included in theMAGE-TABarchive Probe information is contained in the Array design files for each platform | |
SNP | Genome_Wide_SNP_6 Human1MDuo HumanHap550 | Raw data File types: binary (.CEL), binary (.idat), and tab-delimited (.txt) | Unnormalized SNP, copy number, and LOH data File type: tab-delimited (.txt) | Normalized copy number and LOH data, per sample File type: tab-delimited (.txt) | Experimental protocol, including calculation methods, is included in theMAGE-TABarchive Probe information is contained in the Array design files for each platform | |
Copy Number Results | IlluminaHiSeq_DNASeqC | Low pass, whole genome sequence of both tumor and normal samples for each participant and analysis of differences in read counts between the tumor and normal sample File type: binary alignment file (.bam) | DNA variants/mutations and copy number variation for each participant File type: variant calling format (.vcf) | Regions with differences in genome coverage (number of reads) between normal and tumor samples for each participant File type: tab-delimited (.tsv) | Experimental protocol, including calculation methods, is included in the DESCRIPTION file of theMAGE-TABarchive | |
HG-CGH-244A HG-CGH-415K_G4124A CGH-1x1M_G4447A | Raw signals per probe File type: tab-delimited (.txt) | Normalized signals for copy number alterations of aggregated regions, per probe or probe set File type: tab-delimited (.tsv and .mat) | Copy number alterations for aggregated/segmented regions, per sample File type: tab-delimited (.tsv and .txt) | Experimental protocol, including calculation methods, is included in theMAGE-TABarchive Probe information is contained in the Array design files for each platform | ||
3. Array based - SNP (see above) |
1 Denotes an older data type/platform that applies to TCGA pilot projects (GBM and Ovarian) only.
2 RNA sequencing has two versions - V1 and V2. Version 2 differs from Version 1 by the algorithm used to generate the data.
No comments:
Post a Comment