Quantcast
Channel: vcf — GATK-Forum
Viewing all articles
Browse latest Browse all 624

invariant calls not consistently included in Unififed Genotyper output

$
0
0

Hello!

I am writing to seek your help on a curious result I received from running the Unified Genotyper. I ran Unified Genotyper to obtain all confident sites (both invariant and variant) on a suite of different bam files. For certain regions of the genome that overlap certain bam file reads, it appears that it did not output invariant sites, only variant sites. I do not know if this is due to differences among bam files, my call to GATK, or is a glitch.

Here is a little more background.
* First, I am using Unified Genotyper to maintain continuity with some analyses I did over a year go.
* My data:
- I have 92 low coverage bam files from samples produced from illumina sequencing a RADtag type enzyme digestion library. The bams contain single-end reads, each 44-bp long. I will refer to these reads as ‘RADtags’.
- I also have 18 high coverage bams produced from illumina sequencing whole genome paired-end libraries, with read lengths varying from 36 to 100. I will refer to these as WGS samples.
- Previously I called SNPs using Unified Genotyper with mode set to EMIT_VARIANTS_ONLY, on all this data at once, to increase the chances of identifying quality SNPs in the low coverage RADtag data and to get genotypes from the high coverage WGS data at the same sites. The data seemed fine.

  • Current issue:
  • I ran the Unified Genotyper with the mode set to EMIT_ALL_CONFIDENT_SITES for all the same data. The vcf includes both invariant and variant calls. However, it seems I am not getting any invariant sites from regions that overlap the RADtag reads. It seems the program is not emitting invariant sites within RADtags, but is reporting invariant sites outside of RADtags and only for WGS samples.

I have attached a subset of my vcf file, and a snapshot image from viewing these samples in igv. In the igv view, the middle samples are the RADtag samples, the higher coverage samples at the top and bottom are the WGS samples. As you can see, there should be plenty of invariant sites in the regions overlapping the RADTAGs and there should be calls that include these data regions and samples.

Can you think of a reason that could be causing this? I have pasted my call to GATK below.

Thank you!
Amanda

java -jar /usr/local/gatk/3.0-0/GenomeAnalysisTK.jar \
-R mg.new.ref.fa \
-T UnifiedGenotyper \
-I AHQT1G_q29.paired.nodup.realign.bam -I BOG10G_q29.paired.nodup.realign.bam -I CAC6G_q29.paired.nodup.realign.bam \
-I CAC9N_q29.paired.nodup.realign.bam -I DUNG_q29.paired.nodup.realign.bam -I IM62.JGI_q29.paired.nodup.realign.bam \
-I KOOTN_q29.paired.nodup.realign.bam -I LMC24G_q29.paired.nodup.realign.bam -I MAR3G_q29.paired.nodup.realign.bam \
-I MED84G_q29.paired.nodup.realign.bam -I MEN104_q29.paired.nodup.realign.bam -I NHN26_q29.paired.nodup.realign.bam \
-I PED5G_q29.paired.nodup.realign.bam -I REM8G_q29.paired.nodup.realign.bam -I SF5N_q29.paired.nodup.realign.bam \
-I SLP9G_q29.paired.nodup.realign.bam -I SWBG_q29.paired.nodup.realign.bam -I YJS6G_q29.paired.nodup.realign.bam \
-I CACid.277_index2.GCTCGG.sortedrealign.bam -I CACid.279b_index2.CTGATT.sortedrealign.bam -I CACid.247_index2.AATACT.sortedrealign.bam -I CACid.237_index2.CGCTGT.sortedrealign.bam \
-I CACid.240b_index2.GTCTCT.sortedrealign.bam -I CACid.272_index2.ATCTCC.sortedrealign.bam -I CACid.282b_index2.TCTGCT.sortedrealign.bam -I CACid.179_index2.TATAGT.sortedrealign.bam \
-I CACid.187_index2.CAGCAT.sortedrealign.bam -I CACid.189_index2.TAATCC.sortedrealign.bam -I CACid.192_index2.GCAGTT.sortedrealign.bam -I CACid.339_index2.CCTGAA.sortedrealign.bam \
-I CACid.235_index2.AAGCGA.sortedrealign.bam -I CACid.236_index2.AACTTA.sortedrealign.bam -I CACid.249_index2.GTTCCT.sortedrealign.bam -I CACid.370_index2.CTAATA.sortedrealign.bam \
-I CACid.372_index2.CCGAAT.sortedrealign.bam -I CACid.215_index2.CTTGTA.sortedrealign.bam -I CACid.218_index2.CCCTCG.sortedrealign.bam -I CACid.219_index2.ACTGAC.sortedrealign.bam \
-I CACid.221_index2.GTGACT.sortedrealign.bam -I CACid.225_index2.ACTAGC.sortedrealign.bam -I CACid.226_index2.CGACTA.sortedrealign.bam -I CACid.231_index2.GTGAGA.sortedrealign.bam \
-I CACid.232_index2.AATCTA.sortedrealign.bam -I CACid.233_index2.CAAGCT.sortedrealign.bam -I CACid.212_index2.CTCTCA.sortedrealign.bam -I CACid.211_index2.ACTCCT.sortedrealign.bam \
-I CACid.403_index2.GATCAA.sortedrealign.bam -I CACid.280_index2.GGCTTA.sortedrealign.bam -I CACid.283_index2.GTGGAA.sortedrealign.bam -I CACid.285_index2.AAATGA.sortedrealign.bam \
-I CACid.003_index2.GGGTAA.sortedrealign.bam -I CACid.007_index2.CGTCAA.sortedrealign.bam -I CACid.008_index2.GTTGCA.sortedrealign.bam -I CACid.055_index2.ATATAC.sortedrealign.bam \
-I CACid.056_index2.TTAGTA.sortedrealign.bam -I CACid.071_index2.TCGCTT.sortedrealign.bam -I CACid.313_index2.TCGTCT.sortedrealign.bam -I CACid.040_index2.CAATAT.sortedrealign.bam \
-I CACid.292_index2.TCATGG.sortedrealign.bam -I CACid.080_index2.CAGGAA.sortedrealign.bam -I CACid.088_index2.ATCGAC.sortedrealign.bam -I CACid.322_index2.TTGCGA.sortedrealign.bam \
-I CACid.114_index1.GCTCGG.sortedrealign.bam -I CACid.279a_index1.CTGATT.sortedrealign.bam -I CACid.181_index1.AATACT.sortedrealign.bam -I CACid.184_index1.CGCTGT.sortedrealign.bam \
-I CACid.185_index1.GTCTCT.sortedrealign.bam -I CACid.191_index1.ATCTCC.sortedrealign.bam -I CACid.194_index1.TCTGCT.sortedrealign.bam -I CACid.240a_index1.TATAGT.sortedrealign.bam \
-I CACid.238_index1.CAGCAT.sortedrealign.bam -I CACid.371_index1.TAATCC.sortedrealign.bam -I CACid.253_index1.GCAGTT.sortedrealign.bam -I CACid.260_index1.CCTGAA.sortedrealign.bam \
-I CACid.262_index1.AAGCGA.sortedrealign.bam -I CACid.257_index1.AACTTA.sortedrealign.bam -I CACid.258_index1.GTTCCT.sortedrealign.bam -I CACid.216_index1.CTAATA.sortedrealign.bam \
-I CACid.220_index1.CCGAAT.sortedrealign.bam -I CACid.223_index1.CTTGTA.sortedrealign.bam -I CACid.224_index1.CCCTCG.sortedrealign.bam -I CACid.228_index1.ACTGAC.sortedrealign.bam \
-I CACid.229_index1.GTGACT.sortedrealign.bam -I CACid.281_index1.ACTAGC.sortedrealign.bam -I CACid.282a_index1.CGACTA.sortedrealign.bam -I CACid.284_index1.GTGAGA.sortedrealign.bam \
-I CACid.286_index1.AATCTA.sortedrealign.bam -I CACid.287_index1.CAAGCT.sortedrealign.bam -I CACid.288_index1.CTCTCA.sortedrealign.bam -I CACid.072_index1.ACTCCT.sortedrealign.bam \
-I CACid.077_index1.GATCAA.sortedrealign.bam -I CACid.078_index1.GGCTTA.sortedrealign.bam -I CACid.084_index1.GTGGAA.sortedrealign.bam -I CACid.087a_index1.AAATGA.sortedrealign.bam \
-I CACid.087b_index1.GGGTAA.sortedrealign.bam -I CACid.091_index1.CGTCAA.sortedrealign.bam -I CACid.092_index1.GTTGCA.sortedrealign.bam -I CACid.103_index1.CTCACT.sortedrealign.bam \
-I CACid.095_index1.TCCATA.sortedrealign.bam -I CACid.100_index1.ACTCGA.sortedrealign.bam -I CACid.101_index1.GTTCGA.sortedrealign.bam -I CACid.106_index1.ATATAC.sortedrealign.bam \
-I CACid.108_index1.TTAGTA.sortedrealign.bam -I CACid.110_index1.TCGCTT.sortedrealign.bam -I CACid.112_index1.TCGTCT.sortedrealign.bam -I CACid.144_index1.CAATAT.sortedrealign.bam \
-I CACid.149_index1.TCATGG.sortedrealign.bam -I CACid.321_index1.CAGGAA.sortedrealign.bam -I CACid.323_index1.ATCGAC.sortedrealign.bam -I CACid.324_index1.TTGCGA.sortedrealign.bam \
-o CACid.MIM.sites.q40.6.14.15.vcf \
--metrics_file CACid.MIM.sites.q40.6.14.15.metrics.txt \
--genotype_likelihoods_model SNP \
--output_mode EMIT_ALL_CONFIDENT_SITES \
-stand_call_conf 40 \
-stand_emit_conf 40 \
-l INFO


Viewing all articles
Browse latest Browse all 624