Monday, February 4, 2013

TCGA Data Levels and Data Types

source: https://tcga-data.nci.nih.gov/tcga/tcgaDataType.jsp


Home > About The Data > Data Levels And Data Types

Data Levels and Data Types

The table below shows the relationship of TCGA data types to data levels as well as information on important metadata.

Please see the TCGA Data Primer for a detailed guide on TCGA data types.

*Red indicates data that are in the controlled-access data tier. Data in this tier require user authorization in order to access them. Please see the Access Tiers page for more information.

Relationship of Data Levels to Data Types

Data TypeData SubtypesTechnology Platform(s)Level 1Level 2Level 3Important Metadata
Clinical Data1. Clinical datan/aClinical information for each participant (including demographic information, treatment information, survival data, etc)

File types: tab-delimited "biotab" (.txt) and .xml
n/an/aThe BCR data dictionarydescribes of all the clinical and biospecimen data elements in TCGA
2. Biospecimen datan/aInformation on how samples from each participant were processed by the Biospecimen Core Resource Center (BCR)

File types: tab-delimited "biotab" (.txt) and .xml
n/an/aThe BCR data dictionarydescribes of all the clinical and biospecimen data elements in TCGA
Tissue Slide Images1. Diagnostic imagen/aTissue images used by the hospital to diagnose participant

File type: .svs (image viewer)
n/an/aAvailable images are listed in thebiospecimen biotab and xml files
2. Tissue imagen/aImages of tissue samples from each participant that were used for TCGA analyses

File type: .svs (image viewer)
n/an/aAvailable images are listed in thebiospecimen biotab and xml files
Pathology Reportsn/aPathology reports for a subset of participants

File type: .pdf
n/an/an/a
Microsatellite Instability (MSI)n/aMarkers indicating presence or absence of a MSI shift, allele homozygosity/heterozygosity, and loss of heterozygosity (LOH) observed in the tumor sample for each participant

File types: fragment analysis trace file (.fsa) and tab-delimited (.txt) file summarizing the trace file
n/aClassifications of microsatellite instability detected for each participant's tumor sample

File type: auxiliary.xml
Level 1 data are submitted as part of a standardMAGE-TABarchive

Level 3 data are contained in the BCR clinical data archives
DNA Sequencing1. Whole exome sequence (available at the Cancer Genomics Hub)IluminaGA_DNASeq
SOLiD_DNASeq
Whole exome sequence for both tumor and normal sample for each participant

File type: binary alignment file (.bam)
n/an/aExperimental protocol, including primer information, is contained in the metadata.xml fileassociated with each .bam file
2. Whole genome sequence (available at the Cancer Genomics Hub)IluminaGA_DNASeq
SOLiD_DNASeq
Whole genome sequence for both tumor and normal sample for select participants

File type: binary alignment file (.bam)
n/an/aExperimental protocol, including primer information, is contained in the metadata.xml fileassociated with each .bam file
3. Sequence traces (may be available at the NCBI Trace Archive)1n/aRaw sequence output from older sequencing technologies

File type: sequence chromatogram format (.scf)
n/an/aTrace-sample relationship (.tr) files map NCBI trace IDs to TCGA biospecimen barcodes
4. MutationsIluminaGA_DNASeq
SOLiD_DNASeq
Whole genome and exome sequence - see aboveValidated and unvalidated DNA variant/ mutations for each participant

File types: mutation annotation file (.maf) and variant calling file (.vcf)
Validated DNA variants/mutations for each participant

File type: mutation annotation file (.maf)
TheDESCRIPTIONfile contains a summary of the mutation detection and validation method

The .maf files do not have an standard MAGE-TAB archive associated with them
Expression - miRNA Sequencing1. miRNA sequence (available at theCancer Genomics Hub)IlluminaGA_miRNASeq
IlluminaHiSeq_miRNASeq
miRNA sequence for each participant's tumor sample

File type: binary alignment file (.bam)
n/an/aExperimental protocol, including primer information, is contained in the metadata.xml fileassociated with each .bam file
2. miRNAIlluminaGA_miRNASeq
IlluminaHiSeq_miRNASeq
miRNA sequence for each participant's tumor sample - see aboven/aThe calculated expression for all reads aligning to a particular miRNA, per sample

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in the DESCRIPTION file of theMAGE-TABarchive
3. IsoformIlluminaGA_miRNASeq
IlluminaHiSeq_miRNASeq
miRNA sequence for each participant's tumor sample - see aboven/aThe calculated expression for each individual miRNA sequence isoform observed, per sample

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in the DESCRIPTION file of theMAGE-TABarchive
Expression - Protein ArrayMDA_RPPA_CoreHigh resolution images of protein array slides (up to 1000 samples per slide) and raw signals per slide

File types: .tiff (image viewer) for images and tab-delimited (.txt) for signals
Dilution curves for each sample

File type: tab-delimited (.txt)
Normalized protein expression for each gene

File type: tab-delimited (.txt)
Array design files, antibody annotations, and the experimental protocol are included inMAGE-TABarchive
Expression - mRNASequencing21. mRNA sequence (available at theCancer Genomics Hub)IlluminaGA_RNASeq
IlluminaHiSeq_RNASeq
mRNA sequence for each participant's tumor sample

File type: binary alignment file (.bam)
n/an/aExperimental protocol, including primer information, is contained in the metadata.xml fileassociated with each .bam file
2. ExonIlluminaGA_RNASeq
IlluminaHiSeq_RNASeq
mRNA sequence for each participant's tumor sample - see aboven/aThe calculated expression signal of a particular composite exon of a gene

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in the DESCRIPTION file of theMAGE-TABarchive
3. GeneIlluminaGA_RNASeq
IlluminaHiSeq_RNASeq
mRNA sequence for each participant's tumor sample - see aboven/aThe calculated expression signal of a gene

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in the DESCRIPTION file of theMAGE-TABarchive
4. Splice JunctionIlluminaGA_RNASeq
IlluminaHiSeq_RNASeq
mRNA sequence for each participant's tumor sample - see aboven/aThe calculated expression signal of a particular composite splice junction of a gene

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in the DESCRIPTION file of theMAGE-TABarchive
5. IsoformIlluminaGA_RNASeq
IlluminaHiSeq_RNASeq
mRNA sequence for each participant's tumor sample - see aboven/aThe normalized expression signal of individual isoforms (transcripts)

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in the DESCRIPTION file of theMAGE-TABarchive
Expression Array11. GeneAgilentG4502A_07_3
AgilentG4502A_07_2
AgilentG4502A_07_1
HT-HG-U133A
HG-U133-Plus2
Raw signals per probe

File types: binary (.CEL) and tab-delimited (.txt)
Normalized signals per probe or probe set

File type: tab-delimited (.txt)
Expression calls for genes, per sample

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in theMAGE-TABarchive

Probe information is contained in the Array design files for each platform
2. ExonHuEx-1_0-st-v2Raw signals per probe

File type: binary (.CEL)
Normalized signals per probe or probe set

File type: .tab-delimited (.txt)
Expression calls for exons/variants, per sample

File type: .tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in theMAGE-TABarchive

Probe information is contained in the Array design files for each platform
3. miRNAH-miRNA_8x15K
H-miRNA_8x15Kv2
Raw signals per probe

File type: tab-delimited (.txt)
Normalized signals per probe or probe set

File type: tab-delimited (.txt)
Expression calls for miRNAs, per sample

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in theMAGE-TABarchive

Probe information is contained in the Array design files for each platform
DNA MethylationHumanMethylation27
HumanMethylation450

IlluminaDNAMethylation_ OMA002_CPI

IlluminaDNAMethylation_ OMA003_CPI
Raw signal intensities of probes

File type: tab-delimited (.txt) and binary [(.idat) (applies to HumanMethylation450 platform)]
Calculated beta values

File type: tab-delimited (.txt)
Calculated beta values mapped to genome, per sample

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in theMAGE-TABarchive

Probe information is contained in the Array design files for each platform
SNPGenome_Wide_SNP_6
Human1MDuo
HumanHap550
Raw data

File types: binary (.CEL), binary (.idat), and tab-delimited (.txt)
Unnormalized SNP, copy number, and LOH data

File type: tab-delimited (.txt)
Normalized copy number and LOH data, per sample

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in theMAGE-TABarchive

Probe information is contained in the Array design files for each platform
Copy Number Results1. Sequencing basedIlluminaHiSeq_DNASeqCLow pass, whole genome sequence of both tumor and normal samples for each participant and analysis of differences in read counts between the tumor and normal sample

File type: binary alignment file (.bam)
DNA variants/mutations and copy number variation for each participant

File type: variant calling format (.vcf)
Regions with differences in genome coverage (number of reads) between normal and tumor samples for each participant

File type: tab-delimited (.tsv)
Experimental protocol, including calculation methods, is included in the DESCRIPTION file of theMAGE-TABarchive
2. Array based - CGH1HG-CGH-244A
HG-CGH-415K_G4124A
CGH-1x1M_G4447A
Raw signals per probe

File type: tab-delimited (.txt)
Normalized signals for copy number alterations of aggregated regions, per probe or probe set

File type: tab-delimited (.tsv and .mat)
Copy number alterations for aggregated/segmented regions, per sample

File type: tab-delimited (.tsv and .txt)
Experimental protocol, including calculation methods, is included in theMAGE-TABarchive

Probe information is contained in the Array design files for each platform
3. Array based - SNP (see above)     
1 Denotes an older data type/platform that applies to TCGA pilot projects (GBM and Ovarian) only.
2 RNA sequencing has two versions - V1 and V2. Version 2 differs from Version 1 by the algorithm used to generate the data.

No comments:

Post a Comment