Quantcast
Channel: vcf — GATK-Forum
Viewing all articles
Browse latest Browse all 624

Empty VCF files using UnifiedGenotyper

$
0
0

Hi,

I must be doing something silly because all of my VCF files are empty (header, no results) after calling with Unified Genotyper. I'm using GATK version 2.7-2-g6bda569 with Java 1.7.00_40. The FASTQs were aligned with Novoalign.

java -Xmx4g -Djava.io.tmpdir=tmp -jar GenomeAnalysisTK.jar -R hg19_random.fa -D dbsnp_137.hg19.vcf -T UnifiedGenotyper -I data/gatk.realigned.recal.chr1-1-249250621.bam -o data/gatk.snps.raw.chr1-1-249250621.vcf -metrics data/gatk.snps.raw.chr1-1-249250621.vcf.metrics -L chr1:1-249250621 --computeSLOD -stand_call_conf 30 -stand_emit_conf 1 -nt 8 -K GATK_public.key -et NO_ET

Here is a snippet from my BAM file:

H801:144:C39TBACXX:7:2113:7016:95747    73      chr1    671881  20      101M    =       671881  0       GCGGAGGCTGCCGTGACGTAGGGTATGGGCCTAAATAGGCCATTGTGAGTCATGAGCTTGGTCTGTAGAGGCTGACTGGAGAAAGTTCTCGGCCTGGAGAG YYYZZZZZZZZZZZZYZ]ZZ]]]YYYZ]]]ZZ]]]Z]Z]Z]]]]]Z]]]YZZZ]]]]ZYZZZZZZZZZYYYZZZZYYZZYZYZZZYZYZZZZZZZZZWXWW   BD:Z:]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]      MD:Z:101        PG:Z:novoalign  RG:Z:SWID:testing:0     BI:Z:]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]      NM:i:0  UQ:i:0  AS:i:0
H801:144:C39TBACXX:7:2302:11261:7162    153     chr1    971492  70      101M    =       971492  0       GCTGTCTGTGAGGGCTGTGCTGAGGCCTTCCTGACCAGCACATGGGGTGGGAAGGACGACCTGGGGAATCCTGAAGTGATCTGAAGACAGAGCCCTGGGCT   WYWXYZZZZZYZZXZYZZZYZZZZZZYYZZZZZZZZZZZZZZY]]]]ZZYYYZZZYYZZZZ]]]ZZZYZXZ]ZZZZZZYZ]]]]ZZYZZZZYZZZZZZYYY   BD:Z:]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]      MD:Z:101        PG:Z:novoalign  RG:Z:SWID:testing:0     BI:Z:]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]      NM:i:0  UQ:i:0  AS:i:0
H801:144:C39TBACXX:7:2302:13699:15898   153     chr1    971492  70      101M    =       971492  0       GCTGTCTGTGAGGGCTGTGCTGAGGCCTTCCTGACCAGCACATGGGGTGGGAAGGACGACCTGGGGAATCCTGAAGTGATCTGAAGACAGAGCCCTGGGCT   XYXZZYZZZZYZZZYYXYZZYZZZZZYYZZZZYYZZYYZZZYY]]]]]ZZZ]]]ZZZYZ]]Z]]Z]ZZZ]]]]]]ZZYZZ]]ZZYYYZYZXZZZZZZZYYX   BD:Z:]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]      MD:Z:101        PG:Z:novoalign  RG:Z:SWID:testing:0     BI:Z:]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]      NM:i:0  UQ:i:0  AS:i:0
H801:144:C39TBACXX:7:2206:12702:96307   97      chr1    987905  70      101M    =       988021  217     GTGTGCATATGGGTCCATGTATGTGTGTGTATATGAGGGAGACACGCAGGTGTGTGTCTGAGTGTGTGCGCACATGGGTCCATGTATGTGTGTGTATAGGT   YXYZZZZZZZZZZZZ]]]]ZZZ]ZZZYYZZZZ]]]]]]]Z]ZZ]Z]]]]ZZZZZZZYZZ]ZZXYYYWYYZWXWXYWWWYZZYZYZZZYZZZZWXWWYZZYW   BD:Z:]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]      MD:Z:101        PG:Z:novoalign  RG:Z:SWID:testing:0     BI:Z:]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]      NM:i:0  MQ:i:70 UQ:i:0  AS:i:0
H801:144:C39TBACXX:7:2206:12702:96307   145     chr1    988021  70      101M    =       987905  -217    GTGTGTGTCCGTGTGTGTGCATGGGTCCATGTGTGTATAGTGTGTACACATGGGTCCATGTATGTGTGTGTATATGAGGGAGACACGCAGGTGTGTGTCCG   WWWWWZXZXXWWWXZYWWWYWWWWWWWXWXZYXYWWYYWX]]]ZYXZZYZZ]]]]]]]Z]Z]]]]]]]]]]]]]Z]]]]]]]]]Z]]]ZZZZZZZZZZYYY   BD:Z:]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]       

And the output on the command line when running

INFO  12:44:55,654 HelpFormatter - --------------------------------------------------------------------------------
INFO  12:44:55,656 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.7-2-g6bda569, Compiled 2013/08/28 16:30:29
INFO  12:44:55,656 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO  12:44:55,656 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO  12:44:55,660 HelpFormatter - Program Args: -R hg19_random.fa -D dbsnp_137.hg19.vcf -T UnifiedGenotyper -I data/gatk.realigned.recal.chr1-1-249250621.bam -o data/gatk.snps.raw.chr1-1-249250621.vcf -metrics data/gatk.snps.raw.chr1-1-249250621.vcf.metrics -et NO_ET -L chr1:1-249250621 --computeSLOD -stand_call_conf 30 -stand_emit_conf 1 -nt 8 -K GATK_public.key
INFO  12:44:55,660 HelpFormatter - Date/Time: 2014/01/16 12:44:55
INFO  12:44:55,661 HelpFormatter - --------------------------------------------------------------------------------
INFO  12:44:55,661 HelpFormatter - --------------------------------------------------------------------------------
INFO  12:44:55,755 ArgumentTypeDescriptor - Dynamically determined type of dbsnp_137.hg19.vcf to be VCF
INFO  12:44:56,282 GenomeAnalysisEngine - Strictness is SILENT
INFO  12:44:56,370 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250
INFO  12:44:56,377 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  12:44:56,433 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.05
INFO  12:44:56,490 RMDTrackBuilder - Loading Tribble index from disk for file dbsnp_137.hg19.vcf
INFO  12:44:56,736 IntervalUtils - Processing 249250621 bp from intervals
INFO  12:44:56,749 MicroScheduler - Running the GATK in parallel mode with 8 total threads, 1 CPU thread(s) for each of 8 data thread(s), of 24 processors available on this machine
INFO  12:44:56,799 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO  12:44:56,992 GenomeAnalysisEngine - Done preparing for traversal
INFO  12:44:56,992 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO  12:44:56,992 ProgressMeter -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining
INFO  12:44:57,085 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  12:44:57,090 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.00
INFO  12:44:57,095 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  12:44:57,101 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01
INFO  12:44:57,105 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  12:44:57,108 RMDTrackBuilder - Loading Tribble index from disk for file dbsnp_137.hg19.vcf
INFO  12:44:57,112 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01
INFO  12:44:57,114 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  12:44:57,124 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01
INFO  12:44:57,129 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  12:44:57,140 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01
INFO  12:44:57,168 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  12:44:57,177 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01
INFO  12:44:57,186 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  12:44:57,191 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.00
INFO  12:44:57,285 RMDTrackBuilder - Loading Tribble index from disk for file dbsnp_137.hg19.vcf
INFO  12:44:57,482 RMDTrackBuilder - Loading Tribble index from disk for file dbsnp_137.hg19.vcf
INFO  12:44:57,655 RMDTrackBuilder - Loading Tribble index from disk for file dbsnp_137.hg19.vcf
INFO  12:44:57,827 RMDTrackBuilder - Loading Tribble index from disk for file dbsnp_137.hg19.vcf
INFO  12:44:57,996 RMDTrackBuilder - Loading Tribble index from disk for file dbsnp_137.hg19.vcf
INFO  12:44:58,301 RMDTrackBuilder - Loading Tribble index from disk for file dbsnp_137.hg19.vcf
INFO  12:45:26,995 ProgressMeter -   chr1:93237277        9.18e+07   30.0 s        0.0 s     37.4%        80.0 s    50.0 s
INFO  12:45:56,999 ProgressMeter -  chr1:200739385        1.99e+08   60.0 s        0.0 s     80.5%        74.0 s    14.0 s
INFO  12:46:12,672 ProgressMeter -            done        2.49e+08   75.0 s        0.0 s    100.0%        75.0 s     0.0 s
INFO  12:46:12,673 ProgressMeter - Total runtime 75.68 secs, 1.26 min, 0.02 hours
INFO  12:46:12,673 MicroScheduler - 5225 reads were filtered out during the traversal out of approximately 500463 total reads (1.04%)
INFO  12:46:12,673 MicroScheduler -   -> 5225 reads (1.04% of total) failing BadMateFilter
INFO  12:46:12,673 MicroScheduler -   -> 0 reads (0.00% of total) failing DuplicateReadFilter
INFO  12:46:12,673 MicroScheduler -   -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
INFO  12:46:12,674 MicroScheduler -   -> 0 reads (0.00% of total) failing MalformedReadFilter
INFO  12:46:12,674 MicroScheduler -   -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
INFO  12:46:12,674 MicroScheduler -   -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
INFO  12:46:12,674 MicroScheduler -   -> 0 reads (0.00% of total) failing UnmappedReadFilter

Can anyone tell me what might be wrong? It's reliably failing for every chromosome on reasonably-sized data (this BAM is 20M with ~500k reads).

Thanks in advance


Viewing all articles
Browse latest Browse all 624

Latest Images

Trending Articles



Latest Images