Quantcast
Channel: vcf — GATK-Forum
Viewing all 624 articles
Browse latest View live

Wrong results with QD,bug?

$
0
0

@valentin @Geraldine_VdAuwera
Hello, I do HaplotypeCaller twice,then compared the result vcf files with notepad++ plugin Compare.I found some different value about the QD annotation:

java -Xmx12G -jar $gatk -T HaplotypeCaller -R $fasta -I $data/sra/SRR948862/SRR948862.recal.bam -D $dbsnp --genotyping_mode DISCOVERY -o $data/sra/SRR948862/SRR948862.vcf -nct 12
-nct 12 : 12 threads,bigger than one!!!
image
chrM 12706 rs267606893 T C 6851.77 . AC=2;AF=1.00;AN=2;BaseQRankSum=0.963;ClippingRankSum=0.000;DB;DP=194;ExcessHet=3.0103;FS=3.313;MLEAC=2;MLEAF=1.00;MQ=59.39;MQRankSum=-0.210;QD=33.85;ReadPosRankSum=0.342;SOR=0.461 GT:AD:DP:GQ:PL 1/1:1,192:193:99:6880,544,0
chrM 12706 rs267606893 T C 6851.77 . AC=2;AF=1.00;AN=2;BaseQRankSum=0.963;ClippingRankSum=0.000;DB;DP=194;ExcessHet=3.0103;FS=3.313;MLEAC=2;MLEAF=1.00;MQ=59.39;MQRankSum=-0.210;QD=27.78;ReadPosRankSum=0.342;SOR=0.461 GT:AD:DP:GQ:PL 1/1:1,192:193:99:6880,544,0
we know QD=QUAL/AD(https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_annotator_QualByDepth.php),but they all wrong,and right with same value.


Single-Sample Genotyping: Different Workflow?

$
0
0

Hi,
If I need to independently call and genotype a single sample, is there a different workflow or set of GATK tools and settings that I ought to use instead of using haplotypecaller to generate a GVCF and then using genotypegvcfs to genotype a "batch of 1"?

(In other words, is there another tool or setting that will go directly from BAM to VCF and give better or significantly different results than above.)

I'm aware of the many benefits of the GATK best practices workflow for singly-calling BAMs to GVCFs and then jointly genotyping the batches; however, at the moment I am doing some benchmarking for a project where a joint-calling pipeline may not be feasible and we may need to call each sample independently.

Thanks!
James

CombineGVCFs includes header from dbsnp file?

$
0
0

Dear GATK team,

I have encountered some strange behaviour when running CombineGVCFs (version 3.7-0). When I include a dbsnp file with the -D flag, it appears the entire header of this file is included in the output file. I haven't seen similar behaviour in any of the other tools. Is this expected behaviour? I am concerned this may overwrite header fields that already existed. Also, the copied INFO fields are not used in any of the records.

As an example a VCF header before CombineGVCFs:

$ zcat NA12878.mini.g.vcf.gz | head -n 1000 | grep "#"

##fileformat=VCFv4.2
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele at this location">
##FILTER=<ID=LowQual,Description="Low quality">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##GATKCommandLine.HaplotypeCaller=<ID=HaplotypeCaller,Version=3.7-0-gcfedb67,Date="Sat Jun 17 09:17:19 CEST 2017",Epoch=1497683839645,CommandLineOptions="analysis_type=HaplotypeCaller input_file=[/exports/sasc/testing/workspace/Biopet-Functional-Tests/135/output/shiva.ShivaBiopetplanet30xHg19Test/samples/NA12878/NA12878.realign.bam] showFullBamList=false read_buffer_size=null read_filter=[] disable_read_filter=[] intervals=[/exports/sasc/testing/workspace/Biopet-Functional-Tests/135/output/shiva.ShivaBiopetplanet30xHg19Test/variantcalling/haplotypecaller_gvcf/.scatter/haplotypecaller:NA12878.g.vcf.gz-sg/temp_001_of_100/scatter.intervals] excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/exports/genomes/species/H.sapiens/hg19/reference.fa nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=500 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 static_quantized_quals=null round_down_quantized=false disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 secondsBetweenProgressUpdates=10 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false no_cmdline_in_header=false sites_only=false never_trim_vcf_format_field=false bcf=false bam_compression=null simplifyBAM=false disable_bam_indexing=false generate_md5=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 reference_window_stop=0 phone_home= gatk_key=null tag=NA logging_level=INFO log_to_file=null help=false version=false likelihoodCalculationEngine=PairHMM heterogeneousKmerSizeResolution=COMBO_MIN dbsnp=(RodBinding name= source=UNBOUND) dontTrimActiveRegions=false maxDiscARExtension=25 maxGGAARExtension=300 paddingAroundIndels=150 paddingAroundSNPs=20 comp=[] annotation=[StrandBiasBySample] excludeAnnotation=[ChromosomeCounts, FisherStrand, StrandOddsRatio, QualByDepth] group=[StandardAnnotation, StandardHCAnnotation] debug=false useFilteredReadsForAnnotations=false emitRefConfidence=GVCF bamOutput=null bamWriterType=CALLED_HAPLOTYPES emitDroppedReads=false disableOptimizations=false annotateNDA=false useNewAFCalculator=false heterozygosity=0.001 indel_heterozygosity=1.25E-4 heterozygosity_stdev=0.01 standard_min_confidence_threshold_for_calling=-0.0 standard_min_confidence_threshold_for_emitting=30.0 max_alternate_alleles=6 max_genotype_count=1024 max_num_PL_values=100 input_prior=[] sample_ploidy=2 genotyping_mode=DISCOVERY alleles=(RodBinding name= source=UNBOUND) contamination_fraction_to_filter=0.0 contamination_fraction_per_sample_file=null p_nonref_model=null exactcallslog=null output_mode=EMIT_VARIANTS_ONLY allSitePLs=true gcpHMM=10 pair_hmm_implementation=VECTOR_LOGLESS_CACHING pair_hmm_sub_implementation=ENABLE_ALL always_load_vector_logless_PairHMM_lib=false phredScaledGlobalReadMismappingRate=45 noFpga=false sample_name=null kmerSize=[10, 25] dontIncreaseKmerSizesForCycles=false allowNonUniqueKmersInRef=false numPruningSamples=1 recoverDanglingHeads=false doNotRecoverDanglingBranches=false minDanglingBranchLength=4 consensus=false maxNumHaplotypesInPopulation=128 errorCorrectKmers=false minPruning=2 debugGraphTransformations=false allowCyclesInKmerGraphToGeneratePaths=false graphOutput=null kmerLengthForReadErrorCorrection=25 minObservationsForKmerToBeSolid=20 GVCFGQBands=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 70, 80, 90, 99] indelSizeToEliminateInRefModel=10 min_base_quality_score=10 includeUmappedReads=false useAllelesTrigger=false doNotRunPhysicalPhasing=false keepRG=null justDetermineActiveRegions=false dontGenotype=false dontUseSoftClippedBases=false captureAssemblyFailureBAM=false errorCorrectReads=false pcr_indel_model=CONSERVATIVE maxReadsInRegionPerSample=10000 minReadsPerAlignmentStart=10 mergeVariantsViaLD=false activityProfileOut=null activeRegionOut=null activeRegionIn=null activeRegionExtension=null forceActive=false activeRegionMaxSize=null bandPassSigma=null maxReadsInMemoryPerSample=30000 maxTotalReadsInMemory=10000000 maxProbPropagationDistance=50 activeProbabilityThreshold=0.002 min_mapping_quality_score=20 filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">
##GVCFBlock0-1=minGQ=0(inclusive),maxGQ=1(exclusive)
##GVCFBlock1-2=minGQ=1(inclusive),maxGQ=2(exclusive)
##GVCFBlock10-11=minGQ=10(inclusive),maxGQ=11(exclusive)
##GVCFBlock11-12=minGQ=11(inclusive),maxGQ=12(exclusive)
##GVCFBlock12-13=minGQ=12(inclusive),maxGQ=13(exclusive)
##GVCFBlock13-14=minGQ=13(inclusive),maxGQ=14(exclusive)
##GVCFBlock14-15=minGQ=14(inclusive),maxGQ=15(exclusive)
##GVCFBlock15-16=minGQ=15(inclusive),maxGQ=16(exclusive)
##GVCFBlock16-17=minGQ=16(inclusive),maxGQ=17(exclusive)
##GVCFBlock17-18=minGQ=17(inclusive),maxGQ=18(exclusive)
##GVCFBlock18-19=minGQ=18(inclusive),maxGQ=19(exclusive)
##GVCFBlock19-20=minGQ=19(inclusive),maxGQ=20(exclusive)
##GVCFBlock2-3=minGQ=2(inclusive),maxGQ=3(exclusive)
##GVCFBlock20-21=minGQ=20(inclusive),maxGQ=21(exclusive)
##GVCFBlock21-22=minGQ=21(inclusive),maxGQ=22(exclusive)
##GVCFBlock22-23=minGQ=22(inclusive),maxGQ=23(exclusive)
##GVCFBlock23-24=minGQ=23(inclusive),maxGQ=24(exclusive)
##GVCFBlock24-25=minGQ=24(inclusive),maxGQ=25(exclusive)
##GVCFBlock25-26=minGQ=25(inclusive),maxGQ=26(exclusive)
##GVCFBlock26-27=minGQ=26(inclusive),maxGQ=27(exclusive)
##GVCFBlock27-28=minGQ=27(inclusive),maxGQ=28(exclusive)
##GVCFBlock28-29=minGQ=28(inclusive),maxGQ=29(exclusive)
##GVCFBlock29-30=minGQ=29(inclusive),maxGQ=30(exclusive)
##GVCFBlock3-4=minGQ=3(inclusive),maxGQ=4(exclusive)
##GVCFBlock30-31=minGQ=30(inclusive),maxGQ=31(exclusive)
##GVCFBlock31-32=minGQ=31(inclusive),maxGQ=32(exclusive)
##GVCFBlock32-33=minGQ=32(inclusive),maxGQ=33(exclusive)
##GVCFBlock33-34=minGQ=33(inclusive),maxGQ=34(exclusive)
##GVCFBlock34-35=minGQ=34(inclusive),maxGQ=35(exclusive)
##GVCFBlock35-36=minGQ=35(inclusive),maxGQ=36(exclusive)
##GVCFBlock36-37=minGQ=36(inclusive),maxGQ=37(exclusive)
##GVCFBlock37-38=minGQ=37(inclusive),maxGQ=38(exclusive)
##GVCFBlock38-39=minGQ=38(inclusive),maxGQ=39(exclusive)
##GVCFBlock39-40=minGQ=39(inclusive),maxGQ=40(exclusive)
##GVCFBlock4-5=minGQ=4(inclusive),maxGQ=5(exclusive)
##GVCFBlock40-41=minGQ=40(inclusive),maxGQ=41(exclusive)
##GVCFBlock41-42=minGQ=41(inclusive),maxGQ=42(exclusive)
##GVCFBlock42-43=minGQ=42(inclusive),maxGQ=43(exclusive)
##GVCFBlock43-44=minGQ=43(inclusive),maxGQ=44(exclusive)
##GVCFBlock44-45=minGQ=44(inclusive),maxGQ=45(exclusive)
##GVCFBlock45-46=minGQ=45(inclusive),maxGQ=46(exclusive)
##GVCFBlock46-47=minGQ=46(inclusive),maxGQ=47(exclusive)
##GVCFBlock47-48=minGQ=47(inclusive),maxGQ=48(exclusive)
##GVCFBlock48-49=minGQ=48(inclusive),maxGQ=49(exclusive)
##GVCFBlock49-50=minGQ=49(inclusive),maxGQ=50(exclusive)
##GVCFBlock5-6=minGQ=5(inclusive),maxGQ=6(exclusive)
##GVCFBlock50-51=minGQ=50(inclusive),maxGQ=51(exclusive)
##GVCFBlock51-52=minGQ=51(inclusive),maxGQ=52(exclusive)
##GVCFBlock52-53=minGQ=52(inclusive),maxGQ=53(exclusive)
##GVCFBlock53-54=minGQ=53(inclusive),maxGQ=54(exclusive)
##GVCFBlock54-55=minGQ=54(inclusive),maxGQ=55(exclusive)
##GVCFBlock55-56=minGQ=55(inclusive),maxGQ=56(exclusive)
##GVCFBlock56-57=minGQ=56(inclusive),maxGQ=57(exclusive)
##GVCFBlock57-58=minGQ=57(inclusive),maxGQ=58(exclusive)
##GVCFBlock58-59=minGQ=58(inclusive),maxGQ=59(exclusive)
##GVCFBlock59-60=minGQ=59(inclusive),maxGQ=60(exclusive)
##GVCFBlock6-7=minGQ=6(inclusive),maxGQ=7(exclusive)
##GVCFBlock60-70=minGQ=60(inclusive),maxGQ=70(exclusive)
##GVCFBlock7-8=minGQ=7(inclusive),maxGQ=8(exclusive)
##GVCFBlock70-80=minGQ=70(inclusive),maxGQ=80(exclusive)
##GVCFBlock8-9=minGQ=8(inclusive),maxGQ=9(exclusive)
##GVCFBlock80-90=minGQ=80(inclusive),maxGQ=90(exclusive)
##GVCFBlock9-10=minGQ=9(inclusive),maxGQ=10(exclusive)
##GVCFBlock90-99=minGQ=90(inclusive),maxGQ=99(exclusive)
##GVCFBlock99-100=minGQ=99(inclusive),maxGQ=100(exclusive)
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">
##INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes">
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=RAW_MQ,Number=1,Type=Float,Description="Raw data for RMS Mapping Quality">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##contig=<ID=chr1,length=249250621>
##contig=<ID=chr2,length=243199373>
##contig=<ID=chr3,length=198022430>
##contig=<ID=chr4,length=191154276>
##contig=<ID=chr5,length=180915260>
##contig=<ID=chr6,length=171115067>
##contig=<ID=chr7,length=159138663>
##contig=<ID=chr8,length=146364022>
##contig=<ID=chr9,length=141213431>
##contig=<ID=chr10,length=135534747>
##contig=<ID=chr11,length=135006516>
##contig=<ID=chr12,length=133851895>
##contig=<ID=chr13,length=115169878>
##contig=<ID=chr14,length=107349540>
##contig=<ID=chr15,length=102531392>
##contig=<ID=chr16,length=90354753>
##contig=<ID=chr17,length=81195210>
##contig=<ID=chr18,length=78077248>
##contig=<ID=chr19,length=59128983>
##contig=<ID=chr20,length=63025520>
##contig=<ID=chr21,length=48129895>
##contig=<ID=chr22,length=51304566>
##contig=<ID=chrX,length=155270560>
##contig=<ID=chrY,length=59373566>
##contig=<ID=chr1_gl000191_random,length=106433>
##contig=<ID=chr1_gl000192_random,length=547496>
##contig=<ID=chr4_gl000193_random,length=189789>
##contig=<ID=chr4_gl000194_random,length=191469>
##contig=<ID=chr7_gl000195_random,length=182896>
##contig=<ID=chr8_gl000196_random,length=38914>
##contig=<ID=chr8_gl000197_random,length=37175>
##contig=<ID=chr9_gl000198_random,length=90085>
##contig=<ID=chr9_gl000199_random,length=169874>
##contig=<ID=chr9_gl000200_random,length=187035>
##contig=<ID=chr9_gl000201_random,length=36148>
##contig=<ID=chr11_gl000202_random,length=40103>
##contig=<ID=chr17_gl000203_random,length=37498>
##contig=<ID=chr17_gl000204_random,length=81310>
##contig=<ID=chr17_gl000205_random,length=174588>
##contig=<ID=chr17_gl000206_random,length=41001>
##contig=<ID=chr18_gl000207_random,length=4262>
##contig=<ID=chr19_gl000208_random,length=92689>
##contig=<ID=chr19_gl000209_random,length=159169>
##contig=<ID=chr21_gl000210_random,length=27682>
##contig=<ID=chrUn_gl000211,length=166566>
##contig=<ID=chrUn_gl000212,length=186858>
##contig=<ID=chrUn_gl000213,length=164239>
##contig=<ID=chrUn_gl000214,length=137718>
##contig=<ID=chrUn_gl000215,length=172545>
##contig=<ID=chrUn_gl000216,length=172294>
##contig=<ID=chrUn_gl000217,length=172149>
##contig=<ID=chrUn_gl000218,length=161147>
##contig=<ID=chrUn_gl000219,length=179198>
##contig=<ID=chrUn_gl000220,length=161802>
##contig=<ID=chrUn_gl000221,length=155397>
##contig=<ID=chrUn_gl000222,length=186861>
##contig=<ID=chrUn_gl000223,length=180455>
##contig=<ID=chrUn_gl000224,length=179693>
##contig=<ID=chrUn_gl000225,length=211173>
##contig=<ID=chrUn_gl000226,length=15008>
##contig=<ID=chrUn_gl000227,length=128374>
##contig=<ID=chrUn_gl000228,length=129120>
##contig=<ID=chrUn_gl000229,length=19913>
##contig=<ID=chrUn_gl000230,length=43691>
##contig=<ID=chrUn_gl000231,length=27386>
##contig=<ID=chrUn_gl000232,length=40652>
##contig=<ID=chrUn_gl000233,length=45941>
##contig=<ID=chrUn_gl000234,length=40531>
##contig=<ID=chrUn_gl000235,length=34474>
##contig=<ID=chrUn_gl000236,length=41934>
##contig=<ID=chrUn_gl000237,length=45867>
##contig=<ID=chrUn_gl000238,length=39939>
##contig=<ID=chrUn_gl000239,length=33824>
##contig=<ID=chrUn_gl000240,length=41933>
##contig=<ID=chrUn_gl000241,length=42152>
##contig=<ID=chrUn_gl000242,length=43523>
##contig=<ID=chrUn_gl000243,length=43341>
##contig=<ID=chrUn_gl000244,length=39929>
##contig=<ID=chrUn_gl000245,length=36651>
##contig=<ID=chrUn_gl000246,length=38154>
##contig=<ID=chrUn_gl000247,length=36422>
##contig=<ID=chrUn_gl000248,length=39786>
##contig=<ID=chrUn_gl000249,length=38502>
##contig=<ID=chr6_apd_hap1,length=4622290>
##contig=<ID=chr6_cox_hap2,length=4795371>
##contig=<ID=chr6_dbb_hap3,length=4610396>
##contig=<ID=chr6_mann_hap4,length=4683263>
##contig=<ID=chr6_mcf_hap5,length=4833398>
##contig=<ID=chr6_qbl_hap6,length=4611984>
##contig=<ID=chr6_ssto_hap7,length=4928567>
##contig=<ID=chr4_ctg9_hap1,length=590426>
##contig=<ID=chr17_ctg5_hap1,length=1680828>
##reference=file:///exports/genomes/species/H.sapiens/hg19/reference.fa
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NA12878

After running java -jar /exports/kg/programs/GenomeAnalysisTK-3.7.0/GenomeAnalysisTK.jar -T CombineGVCFs -R /exports/genomes/species/H.sapiens/hg19/reference.fa -D /exports/kg/references/dbSNP146.grch37.fixed.vcf.gz -V NA12878.mini.g.vcf.gz -o test.combined.gvcf.vcf.gz -L chr1:1-940097:

$ zcat test.combined.gvcf.vcf.gz | head -n 1000 | grep "#" 

##fileformat=VCFv4.2
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele at this location">
##FILTER=<ID=LowQual,Description="Low quality">
##FILTER=<ID=NC,Description="Inconsistent Genotype Submission For At Least One Sample">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##GATKCommandLine.CombineGVCFs=<ID=CombineGVCFs,Version=3.7-0-gcfedb67,Date="Fri Jun 23 13:40:14 CEST 2017",Epoch=1498218014342,CommandLineOptions="analysis_type=CombineGVCFs input_file=[] showFullBamList=false read_buffer_size=null read_filter=[] disable_read_filter=[] intervals=[chr1:1-940097] excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/exports/genomes/species/H.sapiens/hg19/reference.fa nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 static_quantized_quals=null round_down_quantized=false disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 secondsBetweenProgressUpdates=10 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false no_cmdline_in_header=false sites_only=false never_trim_vcf_format_field=false bcf=false bam_compression=null simplifyBAM=false disable_bam_indexing=false generate_md5=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 reference_window_stop=0 phone_home= gatk_key=null tag=NA logging_level=INFO log_to_file=null help=false version=false annotation=[AS_RMSMappingQuality] dbsnp=(RodBinding name=dbsnp source=/exports/kg/references/dbSNP146.grch37.fixed.vcf.gz) variant=[(RodBindingCollection [(RodBinding name=variant source=NA12878.mini.g.vcf.gz)])] out=/exports/sasc/ahbbollen/combinegvcfs_test/test.combined.gvcf.vcf.gz convertToBasePairResolution=false breakBandsAtMultiplesOf=0 filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">
##GATKCommandLine.HaplotypeCaller=<ID=HaplotypeCaller,Version=3.7-0-gcfedb67,Date="Sat Jun 17 09:17:19 CEST 2017",Epoch=1497683839645,CommandLineOptions="analysis_type=HaplotypeCaller input_file=[/exports/sasc/testing/workspace/Biopet-Functional-Tests/135/output/shiva.ShivaBiopetplanet30xHg19Test/samples/NA12878/NA12878.realign.bam] showFullBamList=false read_buffer_size=null read_filter=[] disable_read_filter=[] intervals=[/exports/sasc/testing/workspace/Biopet-Functional-Tests/135/output/shiva.ShivaBiopetplanet30xHg19Test/variantcalling/haplotypecaller_gvcf/.scatter/haplotypecaller:NA12878.g.vcf.gz-sg/temp_001_of_100/scatter.intervals] excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/exports/genomes/species/H.sapiens/hg19/reference.fa nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=500 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 static_quantized_quals=null round_down_quantized=false disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 secondsBetweenProgressUpdates=10 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false no_cmdline_in_header=false sites_only=false never_trim_vcf_format_field=false bcf=false bam_compression=null simplifyBAM=false disable_bam_indexing=false generate_md5=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 reference_window_stop=0 phone_home= gatk_key=null tag=NA logging_level=INFO log_to_file=null help=false version=false likelihoodCalculationEngine=PairHMM heterogeneousKmerSizeResolution=COMBO_MIN dbsnp=(RodBinding name= source=UNBOUND) dontTrimActiveRegions=false maxDiscARExtension=25 maxGGAARExtension=300 paddingAroundIndels=150 paddingAroundSNPs=20 comp=[] annotation=[StrandBiasBySample] excludeAnnotation=[ChromosomeCounts, FisherStrand, StrandOddsRatio, QualByDepth] group=[StandardAnnotation, StandardHCAnnotation] debug=false useFilteredReadsForAnnotations=false emitRefConfidence=GVCF bamOutput=null bamWriterType=CALLED_HAPLOTYPES emitDroppedReads=false disableOptimizations=false annotateNDA=false useNewAFCalculator=false heterozygosity=0.001 indel_heterozygosity=1.25E-4 heterozygosity_stdev=0.01 standard_min_confidence_threshold_for_calling=-0.0 standard_min_confidence_threshold_for_emitting=30.0 max_alternate_alleles=6 max_genotype_count=1024 max_num_PL_values=100 input_prior=[] sample_ploidy=2 genotyping_mode=DISCOVERY alleles=(RodBinding name= source=UNBOUND) contamination_fraction_to_filter=0.0 contamination_fraction_per_sample_file=null p_nonref_model=null exactcallslog=null output_mode=EMIT_VARIANTS_ONLY allSitePLs=true gcpHMM=10 pair_hmm_implementation=VECTOR_LOGLESS_CACHING pair_hmm_sub_implementation=ENABLE_ALL always_load_vector_logless_PairHMM_lib=false phredScaledGlobalReadMismappingRate=45 noFpga=false sample_name=null kmerSize=[10, 25] dontIncreaseKmerSizesForCycles=false allowNonUniqueKmersInRef=false numPruningSamples=1 recoverDanglingHeads=false doNotRecoverDanglingBranches=false minDanglingBranchLength=4 consensus=false maxNumHaplotypesInPopulation=128 errorCorrectKmers=false minPruning=2 debugGraphTransformations=false allowCyclesInKmerGraphToGeneratePaths=false graphOutput=null kmerLengthForReadErrorCorrection=25 minObservationsForKmerToBeSolid=20 GVCFGQBands=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 70, 80, 90, 99] indelSizeToEliminateInRefModel=10 min_base_quality_score=10 includeUmappedReads=false useAllelesTrigger=false doNotRunPhysicalPhasing=false keepRG=null justDetermineActiveRegions=false dontGenotype=false dontUseSoftClippedBases=false captureAssemblyFailureBAM=false errorCorrectReads=false pcr_indel_model=CONSERVATIVE maxReadsInRegionPerSample=10000 minReadsPerAlignmentStart=10 mergeVariantsViaLD=false activityProfileOut=null activeRegionOut=null activeRegionIn=null activeRegionExtension=null forceActive=false activeRegionMaxSize=null bandPassSigma=null maxReadsInMemoryPerSample=30000 maxTotalReadsInMemory=10000000 maxProbPropagationDistance=50 activeProbabilityThreshold=0.002 min_mapping_quality_score=20 filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">
##GVCFBlock0-1=minGQ=0(inclusive),maxGQ=1(exclusive)
##GVCFBlock1-2=minGQ=1(inclusive),maxGQ=2(exclusive)
##GVCFBlock10-11=minGQ=10(inclusive),maxGQ=11(exclusive)
##GVCFBlock11-12=minGQ=11(inclusive),maxGQ=12(exclusive)
##GVCFBlock12-13=minGQ=12(inclusive),maxGQ=13(exclusive)
##GVCFBlock13-14=minGQ=13(inclusive),maxGQ=14(exclusive)
##GVCFBlock14-15=minGQ=14(inclusive),maxGQ=15(exclusive)
##GVCFBlock15-16=minGQ=15(inclusive),maxGQ=16(exclusive)
##GVCFBlock16-17=minGQ=16(inclusive),maxGQ=17(exclusive)
##GVCFBlock17-18=minGQ=17(inclusive),maxGQ=18(exclusive)
##GVCFBlock18-19=minGQ=18(inclusive),maxGQ=19(exclusive)
##GVCFBlock19-20=minGQ=19(inclusive),maxGQ=20(exclusive)
##GVCFBlock2-3=minGQ=2(inclusive),maxGQ=3(exclusive)
##GVCFBlock20-21=minGQ=20(inclusive),maxGQ=21(exclusive)
##GVCFBlock21-22=minGQ=21(inclusive),maxGQ=22(exclusive)
##GVCFBlock22-23=minGQ=22(inclusive),maxGQ=23(exclusive)
##GVCFBlock23-24=minGQ=23(inclusive),maxGQ=24(exclusive)
##GVCFBlock24-25=minGQ=24(inclusive),maxGQ=25(exclusive)
##GVCFBlock25-26=minGQ=25(inclusive),maxGQ=26(exclusive)
##GVCFBlock26-27=minGQ=26(inclusive),maxGQ=27(exclusive)
##GVCFBlock27-28=minGQ=27(inclusive),maxGQ=28(exclusive)
##GVCFBlock28-29=minGQ=28(inclusive),maxGQ=29(exclusive)
##GVCFBlock29-30=minGQ=29(inclusive),maxGQ=30(exclusive)
##GVCFBlock3-4=minGQ=3(inclusive),maxGQ=4(exclusive)
##GVCFBlock30-31=minGQ=30(inclusive),maxGQ=31(exclusive)
##GVCFBlock31-32=minGQ=31(inclusive),maxGQ=32(exclusive)
##GVCFBlock32-33=minGQ=32(inclusive),maxGQ=33(exclusive)
##GVCFBlock33-34=minGQ=33(inclusive),maxGQ=34(exclusive)
##GVCFBlock34-35=minGQ=34(inclusive),maxGQ=35(exclusive)
##GVCFBlock35-36=minGQ=35(inclusive),maxGQ=36(exclusive)
##GVCFBlock36-37=minGQ=36(inclusive),maxGQ=37(exclusive)
##GVCFBlock37-38=minGQ=37(inclusive),maxGQ=38(exclusive)
##GVCFBlock38-39=minGQ=38(inclusive),maxGQ=39(exclusive)
##GVCFBlock39-40=minGQ=39(inclusive),maxGQ=40(exclusive)
##GVCFBlock4-5=minGQ=4(inclusive),maxGQ=5(exclusive)
##GVCFBlock40-41=minGQ=40(inclusive),maxGQ=41(exclusive)
##GVCFBlock41-42=minGQ=41(inclusive),maxGQ=42(exclusive)
##GVCFBlock42-43=minGQ=42(inclusive),maxGQ=43(exclusive)
##GVCFBlock43-44=minGQ=43(inclusive),maxGQ=44(exclusive)
##GVCFBlock44-45=minGQ=44(inclusive),maxGQ=45(exclusive)
##GVCFBlock45-46=minGQ=45(inclusive),maxGQ=46(exclusive)
##GVCFBlock46-47=minGQ=46(inclusive),maxGQ=47(exclusive)
##GVCFBlock47-48=minGQ=47(inclusive),maxGQ=48(exclusive)
##GVCFBlock48-49=minGQ=48(inclusive),maxGQ=49(exclusive)
##GVCFBlock49-50=minGQ=49(inclusive),maxGQ=50(exclusive)
##GVCFBlock5-6=minGQ=5(inclusive),maxGQ=6(exclusive)
##GVCFBlock50-51=minGQ=50(inclusive),maxGQ=51(exclusive)
##GVCFBlock51-52=minGQ=51(inclusive),maxGQ=52(exclusive)
##GVCFBlock52-53=minGQ=52(inclusive),maxGQ=53(exclusive)
##GVCFBlock53-54=minGQ=53(inclusive),maxGQ=54(exclusive)
##GVCFBlock54-55=minGQ=54(inclusive),maxGQ=55(exclusive)
##GVCFBlock55-56=minGQ=55(inclusive),maxGQ=56(exclusive)
##GVCFBlock56-57=minGQ=56(inclusive),maxGQ=57(exclusive)
##GVCFBlock57-58=minGQ=57(inclusive),maxGQ=58(exclusive)
##GVCFBlock58-59=minGQ=58(inclusive),maxGQ=59(exclusive)
##GVCFBlock59-60=minGQ=59(inclusive),maxGQ=60(exclusive)
##GVCFBlock6-7=minGQ=6(inclusive),maxGQ=7(exclusive)
##GVCFBlock60-70=minGQ=60(inclusive),maxGQ=70(exclusive)
##GVCFBlock7-8=minGQ=7(inclusive),maxGQ=8(exclusive)
##GVCFBlock70-80=minGQ=70(inclusive),maxGQ=80(exclusive)
##GVCFBlock8-9=minGQ=8(inclusive),maxGQ=9(exclusive)
##GVCFBlock80-90=minGQ=80(inclusive),maxGQ=90(exclusive)
##GVCFBlock9-10=minGQ=9(inclusive),maxGQ=10(exclusive)
##GVCFBlock90-99=minGQ=90(inclusive),maxGQ=99(exclusive)
##GVCFBlock99-100=minGQ=99(inclusive),maxGQ=100(exclusive)
##INFO=<ID=ASP,Number=0,Type=Flag,Description="Is Assembly specific. This is set if the variant only maps to one assembly">
##INFO=<ID=ASS,Number=0,Type=Flag,Description="In acceptor splice site FxnCode = 73">
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=CAF,Number=.,Type=String,Description="An ordered, comma delimited list of allele frequencies based on 1000Genomes, starting with the reference allele followed by alternate alleles as ordered in the ALT column. Where a 1000Genomes alternate allele is not in the dbSNPs alternate allele set, the allele is added to the ALT column.  The minor allele is the second largest value in the list, and was previuosly reported in VCF as the GMAF.  This is the GMAF reported on the RefSNP and EntrezSNP pages and VariationReporter">
##INFO=<ID=CDA,Number=0,Type=Flag,Description="Variation is interrogated in a clinical diagnostic assay">
##INFO=<ID=CFL,Number=0,Type=Flag,Description="Has Assembly conflict. This is for weight 1 and 2 variant that maps to different chromosomes on different assemblies.">
##INFO=<ID=COMMON,Number=1,Type=Integer,Description="RS is a common SNP.  A common SNP is one that has at least one 1000Genomes population with a minor allele of frequency >= 1% and for which 2 or more founders contribute to that minor allele frequency.">
##INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
##INFO=<ID=DSS,Number=0,Type=Flag,Description="In donor splice-site FxnCode = 75">
##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">
##INFO=<ID=G5,Number=0,Type=Flag,Description=">5% minor allele frequency in 1+ populations">
##INFO=<ID=G5A,Number=0,Type=Flag,Description=">5% minor allele frequency in each and all populations">
##INFO=<ID=GENEINFO,Number=1,Type=String,Description="Pairs each of gene symbol:gene id.  The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|)">
##INFO=<ID=GNO,Number=0,Type=Flag,Description="Genotypes available. The variant has individual genotype (in SubInd table).">
##INFO=<ID=HD,Number=0,Type=Flag,Description="Marker is on high density genotyping kit (50K density or greater).  The variant may have phenotype associations present in dbGaP.">
##INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes">
##INFO=<ID=INT,Number=0,Type=Flag,Description="In Intron FxnCode = 6">
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
##INFO=<ID=KGPhase1,Number=0,Type=Flag,Description="1000 Genome phase 1 (incl. June Interim phase 1)">
##INFO=<ID=KGPhase3,Number=0,Type=Flag,Description="1000 Genome phase 3">
##INFO=<ID=LSD,Number=0,Type=Flag,Description="Submitted from a locus-specific database">
##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=MTP,Number=0,Type=Flag,Description="Microattribution/third-party annotation(TPA:GWAS,PAGE)">
##INFO=<ID=MUT,Number=0,Type=Flag,Description="Is mutation (journal citation, explicit fact): a low frequency variation that is cited in journal and other reputable sources">
##INFO=<ID=NOC,Number=0,Type=Flag,Description="Contig allele not present in variant allele list. The reference sequence allele at the mapped position is not present in the variant allele list, adjusted for orientation.">
##INFO=<ID=NOV,Number=0,Type=Flag,Description="Rs cluster has non-overlapping allele sets. True when rs set has more than 2 alleles from different submissions and these sets share no alleles in common.">
##INFO=<ID=NSF,Number=0,Type=Flag,Description="Has non-synonymous frameshift A coding region variation where one allele in the set changes all downstream amino acids. FxnClass = 44">
##INFO=<ID=NSM,Number=0,Type=Flag,Description="Has non-synonymous missense A coding region variation where one allele in the set changes protein peptide. FxnClass = 42">
##INFO=<ID=NSN,Number=0,Type=Flag,Description="Has non-synonymous nonsense A coding region variation where one allele in the set changes to STOP codon (TER). FxnClass = 41">
##INFO=<ID=OM,Number=0,Type=Flag,Description="Has OMIM/OMIA">
##INFO=<ID=OTH,Number=0,Type=Flag,Description="Has other variant with exactly the same set of mapped positions on NCBI refernce assembly.">
##INFO=<ID=PM,Number=0,Type=Flag,Description="Variant is Precious(Clinical,Pubmed Cited)">
##INFO=<ID=PMC,Number=0,Type=Flag,Description="Links exist to PubMed Central article">
##INFO=<ID=R3,Number=0,Type=Flag,Description="In 3' gene region FxnCode = 13">
##INFO=<ID=R5,Number=0,Type=Flag,Description="In 5' gene region FxnCode = 15">
##INFO=<ID=RAW_MQ,Number=1,Type=Float,Description="Raw data for RMS Mapping Quality">
##INFO=<ID=REF,Number=0,Type=Flag,Description="Has reference A coding region variation where one allele in the set is identical to the reference sequence. FxnCode = 8">
##INFO=<ID=RS,Number=1,Type=Integer,Description="dbSNP ID (i.e. rs number)">
##INFO=<ID=RSPOS,Number=1,Type=Integer,Description="Chr position reported in dbSNP">
##INFO=<ID=RV,Number=0,Type=Flag,Description="RS orientation is reversed">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##INFO=<ID=S3D,Number=0,Type=Flag,Description="Has 3D structure - SNP3D table">
##INFO=<ID=SAO,Number=1,Type=Integer,Description="Variant Allele Origin: 0 - unspecified, 1 - Germline, 2 - Somatic, 3 - Both">
##INFO=<ID=SLO,Number=0,Type=Flag,Description="Has SubmitterLinkOut - From SNP->SubSNP->Batch.link_out">
##INFO=<ID=SSR,Number=1,Type=Integer,Description="Variant Suspect Reason Codes (may be more than one value added together) 0 - unspecified, 1 - Paralog, 2 - byEST, 4 - oldAlign, 8 - Para_EST, 16 - 1kg_failed, 1024 - other">
##INFO=<ID=SYN,Number=0,Type=Flag,Description="Has synonymous A coding region variation where one allele in the set does not change the encoded amino acid. FxnCode = 3">
##INFO=<ID=TPA,Number=0,Type=Flag,Description="Provisional Third Party Annotation(TPA) (currently rs from PHARMGKB who will give phenotype data)">
##INFO=<ID=U3,Number=0,Type=Flag,Description="In 3' UTR Location is in an untranslated region (UTR). FxnCode = 53">
##INFO=<ID=U5,Number=0,Type=Flag,Description="In 5' UTR Location is in an untranslated region (UTR). FxnCode = 55">
##INFO=<ID=VC,Number=1,Type=String,Description="Variation Class">
##INFO=<ID=VLD,Number=0,Type=Flag,Description="Is Validated.  This bit is set if the variant has 2+ minor allele count based on frequency or genotype data.">
##INFO=<ID=VP,Number=1,Type=String,Description="Variation Property.  Documentation is at ftp://ftp.ncbi.nlm.nih.gov/snp/specs/dbSNP_BitField_latest.pdf">
##INFO=<ID=WGT,Number=1,Type=Integer,Description="Weight, 00 - unmapped, 1 - weight 1, 2 - weight 2, 3 - weight 3 or more">
##INFO=<ID=WTD,Number=0,Type=Flag,Description="Is Withdrawn by submitter If one member ss is withdrawn by submitter, then this bit is set.  If all member ss' are withdrawn, then the rs is deleted to SNPHistory">
##INFO=<ID=dbSNPBuildID,Number=1,Type=Integer,Description="First dbSNP Build for RS">
##contig=<ID=chr1,length=249250621>
##contig=<ID=chr2,length=243199373>
##contig=<ID=chr3,length=198022430>
##contig=<ID=chr4,length=191154276>
##contig=<ID=chr5,length=180915260>
##contig=<ID=chr6,length=171115067>
##contig=<ID=chr7,length=159138663>
##contig=<ID=chr8,length=146364022>
##contig=<ID=chr9,length=141213431>
##contig=<ID=chr10,length=135534747>
##contig=<ID=chr11,length=135006516>
##contig=<ID=chr12,length=133851895>
##contig=<ID=chr13,length=115169878>
##contig=<ID=chr14,length=107349540>
##contig=<ID=chr15,length=102531392>
##contig=<ID=chr16,length=90354753>
##contig=<ID=chr17,length=81195210>
##contig=<ID=chr18,length=78077248>
##contig=<ID=chr19,length=59128983>
##contig=<ID=chr20,length=63025520>
##contig=<ID=chr21,length=48129895>
##contig=<ID=chr22,length=51304566>
##contig=<ID=chrX,length=155270560>
##contig=<ID=chrY,length=59373566>
##contig=<ID=chr1_gl000191_random,length=106433>
##contig=<ID=chr1_gl000192_random,length=547496>
##contig=<ID=chr4_gl000193_random,length=189789>
##contig=<ID=chr4_gl000194_random,length=191469>
##contig=<ID=chr7_gl000195_random,length=182896>
##contig=<ID=chr8_gl000196_random,length=38914>
##contig=<ID=chr8_gl000197_random,length=37175>
##contig=<ID=chr9_gl000198_random,length=90085>
##contig=<ID=chr9_gl000199_random,length=169874>
##contig=<ID=chr9_gl000200_random,length=187035>
##contig=<ID=chr9_gl000201_random,length=36148>
##contig=<ID=chr11_gl000202_random,length=40103>
##contig=<ID=chr17_gl000203_random,length=37498>
##contig=<ID=chr17_gl000204_random,length=81310>
##contig=<ID=chr17_gl000205_random,length=174588>
##contig=<ID=chr17_gl000206_random,length=41001>
##contig=<ID=chr18_gl000207_random,length=4262>
##contig=<ID=chr19_gl000208_random,length=92689>
##contig=<ID=chr19_gl000209_random,length=159169>
##contig=<ID=chr21_gl000210_random,length=27682>
##contig=<ID=chrUn_gl000211,length=166566>
##contig=<ID=chrUn_gl000212,length=186858>
##contig=<ID=chrUn_gl000213,length=164239>
##contig=<ID=chrUn_gl000214,length=137718>
##contig=<ID=chrUn_gl000215,length=172545>
##contig=<ID=chrUn_gl000216,length=172294>
##contig=<ID=chrUn_gl000217,length=172149>
##contig=<ID=chrUn_gl000218,length=161147>
##contig=<ID=chrUn_gl000219,length=179198>
##contig=<ID=chrUn_gl000220,length=161802>
##contig=<ID=chrUn_gl000221,length=155397>
##contig=<ID=chrUn_gl000222,length=186861>
##contig=<ID=chrUn_gl000223,length=180455>
##contig=<ID=chrUn_gl000224,length=179693>
##contig=<ID=chrUn_gl000225,length=211173>
##contig=<ID=chrUn_gl000226,length=15008>
##contig=<ID=chrUn_gl000227,length=128374>
##contig=<ID=chrUn_gl000228,length=129120>
##contig=<ID=chrUn_gl000229,length=19913>
##contig=<ID=chrUn_gl000230,length=43691>
##contig=<ID=chrUn_gl000231,length=27386>
##contig=<ID=chrUn_gl000232,length=40652>
##contig=<ID=chrUn_gl000233,length=45941>
##contig=<ID=chrUn_gl000234,length=40531>
##contig=<ID=chrUn_gl000235,length=34474>
##contig=<ID=chrUn_gl000236,length=41934>
##contig=<ID=chrUn_gl000237,length=45867>
##contig=<ID=chrUn_gl000238,length=39939>
##contig=<ID=chrUn_gl000239,length=33824>
##contig=<ID=chrUn_gl000240,length=41933>
##contig=<ID=chrUn_gl000241,length=42152>
##contig=<ID=chrUn_gl000242,length=43523>
##contig=<ID=chrUn_gl000243,length=43341>
##contig=<ID=chrUn_gl000244,length=39929>
##contig=<ID=chrUn_gl000245,length=36651>
##contig=<ID=chrUn_gl000246,length=38154>
##contig=<ID=chrUn_gl000247,length=36422>
##contig=<ID=chrUn_gl000248,length=39786>
##contig=<ID=chrUn_gl000249,length=38502>
##contig=<ID=chr6_apd_hap1,length=4622290>
##contig=<ID=chr6_cox_hap2,length=4795371>
##contig=<ID=chr6_dbb_hap3,length=4610396>
##contig=<ID=chr6_mann_hap4,length=4683263>
##contig=<ID=chr6_mcf_hap5,length=4833398>
##contig=<ID=chr6_qbl_hap6,length=4611984>
##contig=<ID=chr6_ssto_hap7,length=4928567>
##contig=<ID=chr4_ctg9_hap1,length=590426>
##contig=<ID=chr17_ctg5_hap1,length=1680828>
##dbSNP_BUILD_ID=146
##fileDate=20151104
##phasing=partial
##reference=file:///exports/genomes/species/H.sapiens/hg19/reference.fa
##source=dbSNP
##variationPropertyDocumentationUrl=ftp://ftp.ncbi.nlm.nih.gov/snp/specs/dbSNP_BitField_latest.pdf  
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NA12878

48 new INFO fields have appeared, all of which originate from the dbsnp file.

Thanks in advance for your answer :-)

GenotypeGVCFs: no records in VCF

$
0
0

Dear GATK team,

I am having troubles calling genotypes on *.gvcf produced by HaplotypeCaller in GVCF mode.
When I run GenotypeGVCFs (GATK 3.5), I get only header in resulting VCF file, but no records.
I had no such problem before.

Could you advice on possible reason of the issue and how to fix it?

Here is the command and output:

java -Xmx12g  -Djava.io.tmpdir=./tmp -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R reference.fa --variant  sample1.g.vcf --variant  sample2.g.vcf --variant  sample3.g.vcf --variant  sample4.g.vcf --variant  sample5.g.vcf --variant  sample6.g.vcf --variant  sample7.g.vcf --variant  sample7.g.vcf --num_threads 4 -o TEST.gt.vcf

note: there are SNPs/INDELs in sample*.g.vcf

##fileformat=VCFv4.2
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele at this location">
##FILTER=<ID=LowQual,Description="Low quality">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
...
...
...
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample1    sample2    sample3    sample4    sample5    sample6    sample7    sample8

Thank you!

Variant in VCF of multiple samples called by HaplotypeCaller absent in their respective BAM files

$
0
0

Hello GATK team,

I followed GATK best practices and called variants with haplotypecaller in 6 exome samples. However, in 4 patients (total) I have a variant on Chr12 that is absent in the BAM file. the variant is a big deletion (bigger than 10 nucleotides) and ONLY in one read of one of the samples I can see the variant. I'm confused about what has happened here. Can you please explain how can a variant be called while not present in the BAM?

Problem with GATK pipeline, merging VCF and ped file.

$
0
0

Hi to all

I have just started using GATK and I have few question about some tools and about the general workflow.

I have 3 exome-seq data from a trio and I have to detect rare or private variants that segregate with the disease.

From the 3 aligned bam file I procedeed with the GATK pipeline (ADDgroupInfo, MarkDup, Realign, BQSR, Unified Genotyper and variant filtration) and I generated 3 VCF file.

As now I have to use the PhaseByTrasmission tool, should I merge the 3 VCF file?

Or it was better to merge the BAM file after adding the group info and proceed with the other analysis?

And should I create my .ped file,(I visited http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped, but I couln't understand how ped file is generated) based on the read group that I have assigned?

Thanks!!!

Please help me to interpret this line. How come I have this disease?

$
0
0

chr1 53676448 . G A 1495.77 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=1.48;ClippingRankSum=-5.270e-01;DP=79;ExcessHet=3.0103;FS=4.485;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.775;QD=19.18;ReadPosRankSum=0.403;SOR=0.581;set=variant2 GT:AD:DP:GQ:PL:CGIANN_VARNAME:CGIANN_1000GAF:CGIANN_ESP6500AF 0/1:29,49:78:99:1524,0,780:-,NM_000098.2(CPT2) c.1102G>A (p.V368I):-,0.5:-,0.456405
chr1 53676986 . C . . . END=53678942;NT GT ./.
chr1 53679264 . T . . . END=53680317;NT GT ./.
chr1 53680529 . a . . . END=53681541;NT GT ./.
chr1 53681771 . G . . . END=53682332;NT GT ./.
chr1 53682540 . G . . . END=53683699;NT GT ./.

I only saw one mutation with quite some information, and several other lines without information. How come I have a cpt-2 deficiency?

how to download the lasted cosmic vcf file

$
0
0

I want to run Mutect2 with the lasted cosmic file. But I can't find where to download it.
I search the Forum, and find others may advise to download cosmic file from "ftp://ngs.sanger.ac.uk/production/cosmic" . But I can't find anything in this website.
I wonder if you can help me to download the lasted cosmic vcf file. Thanks,


SnpEff html and .vcf file result are not matching

$
0
0

Asslamu Alikum

I have successfully managed to run SnpEff for my vcf files. However, the count of missense variants in my html file and the VCF file generated by SnpEff are different.

Missense in HTML: 20,854

Missense in VCF fle: 20,754

Can any one please suggest me about the criteria for missense calculation in html file, so that I could match the vcf file with the missense file

Input used is vcf and version SnpEff 4.3

How does HaplotypeCaller discriminate between heterozygous and homozygous variants?

$
0
0

Dear members of the GATK team

I am using different GATK modules to detect some SNPs in my RNASeq data set. I did a test run for one individual to get an idea about the output of HaplotypeCaller. I know that I still need to filter my variants, but nevertheless I was wondering how HaplotypeCaller set the variant to heterozygous or homozygous. There must be another parameter to take into account (other than the AD values). Am I right?

Here is an example:

0|*|TRINITY_DN53108_c0_g1::TRINITY_DN53108_c0_g1_i1::g.132814::m.132814 7333 . G T 42.77 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=1.644;ClippingRankSum=0.000;DP=21;ExcessHet=3.0103;FS=3.109;MLEAC=1;MLEAF=0.500;MQ=42.00;MQRankSum=0.000;QD=2.04;ReadPosRankSum=0.629;SOR=0.132 GT:AD:DP:GQ:PL 0/1:18,3:21:71:71,0,703

The genotype is 0/1 (G/T) and the AD is 18 to 3. Actually, I would say that this homozygous.
Before I mapped the reads to the reference, I filtered the reads with FastQC and did other processing steps like adapter trimming. I also marked and removed duplicated reads from the BAM file. So, my reads are processed correctly (I would say) and I could trust the final reads.

Nevertheless, with a ration of 18:3, I would still suggest a homozygous variant (just based on the AD values). I would change my mind if there is another value which is important for the decision or if one can say: "If you trust your read files, than this ration is still a reliable result for heterozygous variants.".

But still: If I doubt the files:
Is there any possibility to filter the variants based on their AD values? An example would be to filter out all heterozygous variants which are below the ratio of 30% : 70%?

Thanks in advance for your reply and I am looking forward to your answers.
Julia

Haplotype Caller Makes SNPs look like INDELS

$
0
0

I'm using the HaplotypeCaller to look at SNPs related to antimicrobial resistance and am getting a result that looks like this:

NC_011035.1 2049708 .   CCGGCG  C   ...
NC_011035.1 2049714 .   C   CAAGAA  ...

I believe this is an alignment that would look like:
CCGGCGC
CCAAGAA

but instead of giving me 5 individual SNPs, GATK is calling the region as though it is a 5bp deletion at position 2049708 and a 5bp insertion at position 2049714.

Is there any way to change the parameters so that the appropriate call is made?

My current command is:

java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -nct 12 -R NCC_011035.fasta -I ST547_dedup_reads_group.bam --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 -o ST547_raw.vcf

VariantsToBinaryPed java.lang.ArrayIndexOutOfBoundsException: -1

$
0
0

Hello, can you please help me sort out the following error in running VariantsToBinaryPed:

java -jar /sb/project/fkr-592-aa/data/GalWaRat/bin/third/gatk-3.7/GenomeAnalysisTK.jar -T VariantsToBinaryPed -R /sb/project/fkr-592-aa/genomes/CfloGapsClosed6/Cflo_3.3_gaps_closed6.fasta -V /sb/project/fkr-592-aa/Danzqianqi/Cflo/WGS/filteredSNPss.vcf -m sample_phenotypeinfo2.fam --minGenotypeQuality 0 --bed filteredSNPss.bed --bim filteredSNPss.bim --fam filteredSNPss.fam
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/gs/scratch/zqianqi
INFO 19:31:00,898 HelpFormatter - ----------------------------------------------------------------------------------
INFO 19:31:00,901 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18
INFO 19:31:00,902 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 19:31:00,902 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO 19:31:00,902 HelpFormatter - [Tue Sep 05 19:31:00 EDT 2017] Executing on Linux 2.6.32-642.13.1.el6.x86_64 amd64
INFO 19:31:00,902 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_73-b02
INFO 19:31:00,906 HelpFormatter - Program Args: -T VariantsToBinaryPed -R /sb/project/fkr-592-aa/genomes/CfloGapsClosed6/Cflo_3.3_gaps_closed6.fasta -V /sb/project/fkr-592-aa/Danzqianqi/Cflo/WGS/filteredSNPss.vcf -m sample_phenotypeinfo2.fam --minGenotypeQuality 0 --bed filteredSNPss.bed --bim filteredSNPss.bim --fam filteredSNPss.fam
INFO 19:31:00,910 HelpFormatter - Executing as zqianqi@lg-1r17-n03 on Linux 2.6.32-642.13.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_73-b02.
INFO 19:31:00,911 HelpFormatter - Date/Time: 2017/09/05 19:31:00
INFO 19:31:00,911 HelpFormatter - ----------------------------------------------------------------------------------
INFO 19:31:00,911 HelpFormatter - ----------------------------------------------------------------------------------
INFO 19:31:00,922 GenomeAnalysisEngine - Strictness is SILENT
INFO 19:31:47,656 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 19:32:39,018 GenomeAnalysisEngine - Preparing for traversal
INFO 19:32:39,044 GenomeAnalysisEngine - Done preparing for traversal
INFO 19:32:39,045 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 19:32:39,045 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 19:32:39,046 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime

ERROR --
ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: -1
at htsjdk.variant.variantcontext.GenotypeLikelihoods.getGQLog10FromLikelihoods(GenotypeLikelihoods.java:220)
at org.broadinstitute.gatk.tools.walkers.variantutils.VariantsToBinaryPed.checkGQIsGood(VariantsToBinaryPed.java:442)
at org.broadinstitute.gatk.tools.walkers.variantutils.VariantsToBinaryPed.getStandardEncoding(VariantsToBinaryPed.java:406)
at org.broadinstitute.gatk.tools.walkers.variantutils.VariantsToBinaryPed.getEncoding(VariantsToBinaryPed.java:398)
at org.broadinstitute.gatk.tools.walkers.variantutils.VariantsToBinaryPed.writeIndividualMajor(VariantsToBinaryPed.java:282)
at org.broadinstitute.gatk.tools.walkers.variantutils.VariantsToBinaryPed.map(VariantsToBinaryPed.java:267)
at org.broadinstitute.gatk.tools.walkers.variantutils.VariantsToBinaryPed.map(VariantsToBinaryPed.java:103)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:316)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-gcfedb67):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://software.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: -1
ERROR ------------------------------------------------------------------------------------------

My .vcf file was made with HaplotypeCaller/GenotypeGVCFs/SelectVariants/VariantFiltration. I used ValidateVariants as well.

This is a snapshot of the .vcf file:

reference=file:///sb/project/fkr-592-aa/genomes/CfloGapsClosed6/Cflo_3.3_gaps_closed6.fasta

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 1 12 13 15 2 9

1 30 . T C 36.19 PASS AC=1;AF=0.100;AN=10;BaseQRankSum=0.712;ClippingRankSum=0.00;DP=16;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.100;MQ=30.46;MQRankSum=1.98;QD=5.17;ReadPosRankSum=0.303;SOR=0.892 GT:AD:DP:GQ:PGT:PID:PL ./.:0,0:0:.:.:.:0,0,0 0/0:1,0:1:3:.:.:0,3,37 0/0:2,0:2:6:.:.:0,6,74 0/0:4,0:4:9:.:.:0,9,135 0/1:5,2:7:66:0|1:30_T_C:66,0,246 0/0:2,0:2:6:.:.:0,6,49
1 45 . A G 33.97 PASS AC=1;AF=0.100;AN=10;BaseQRankSum=1.09;ClippingRankSum=0.00;DP=23;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.100;MQ=30.65;MQRankSum=2.20;QD=3.77;ReadPosRankSum=0.765;SOR=1.179 GT:AD:DP:GQ:PGT:PID:PL ./.:0,0:0:.:.:.:0,0,0 0/0:1,0:1:3:.:.:0,3,37 0/0:5,0:5:15:.:.:0,15,157 0/0:6,0:6:1:.:.:0,1,155 0/1:7,2:9:63:0|1:30_T_C:63,0,288 0/0:2,0:2:6:.:.:0,6,49
1 53 . C CA 24.57 PASS AC=1;AF=0.083;AN=12;BaseQRankSum=1.09;ClippingRankSum=0.00;DP=24;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.083;MQ=30.65;MQRankSum=2.20;QD=2.73;ReadPosRankSum=0.765;SOR=1.179 GT:AD:DP:GQ:PGT:PID:PL 0/0:1,0:1:3:.:.:0,3,37 0/0:1,0:1:3:.:.:0,3,37 0/0:5,0:5:15:.:.:0,15,157 0/0:6,0:6:1:.:.:0,1,169 0/1:7,2:9:63:0|1:30_T_C:63,0,288 0/0:2,0:2:6:.:.:0,6,49

My .fam file looks like this
Cflo 1 0 0 0 5047.16
Cflo 12 0 0 0 6249.9
Cflo 13 0 0 0 6007.21
Cflo 15 0 0 0 7123.6
Cflo 2 0 0 0 5581.36
Cflo 9 0 0 0 7462.87

Thank you! Please let me know if you require more information!

CombineVariants in GATK4

$
0
0

Is it planned to add CombineVariants tool into GATK4.0 toolkit (it existed in previous GATK versions)? The only similar tool currently available in GATK4.0 Beta is GatherVCFs which has very limited possibility and cannot concatenate unsorted VCFs or merge different INFO fields correctly.
Thanks! :)

Picard SortVcf changing VCF file version

$
0
0

I am using Picard SortVcf to reorder the order to match the order of my reference genome and BAM files. And it works great, however it seems to be changing the VCF format from 4.0 to 4.2, and this is incompatible with the downstream steps I need it for. Is there any workaround for this?

Thanks!

When I call Indels from my vcf file using GATK analysis tools I ger an Error!

$
0
0

Hi I used the GATK pipeline until I got a vcf that had SNPs and Indels, so I used GATK Analysis tools to remove SNPs and keep Indels. But after adding the reference genome, dictionary and index I get this error:

The provided VCF file is malformed at approximately line number 455: Unparsable vcf record with allele *, for input source: /home/helenadarmancier/Documents/Estagio/Original/vcf_NoAngH201_NoMono.vcf

How can I fix this?


Picard LiftoverVcf

$
0
0

I am having a problem with picard's LiftoverVcf.

I am trying to Liftover hapmap files (downloaded plink files from hapmap and converted to vcf using plink) from ncbi36 to hg38. I was able to do this with GATK LiftoverVariants. My problem came when I had to merge the hapmap.hg38 with some genotype files (that I liftover from hg19 to hg38 using GATK LiftoverVariants). I am merging them so that I can run population stratification using plink. I used vcf-merge but it complained that a SNP has different reference allele in both files: rs3094315, should be reference allele G (which was correct in the genotype.hg38 files but in the hapmap.hg38 files it was wrong). I also first tried to lift hapmap.ncbi36 to hg19 then to hg38 but the offending allele was still there. So I decided to try and lift the hapmap.ncbi36 using LiftoverVCF from picard.

  1. I downloaded the newest picard build (20 hours old) picard-tools-1.138.
  2. Used the command: java -jar -Xmx6000m ../../../tools/picard-tools-1.138/picard.jar LiftoverVcf I=all_samples_hapmap3_r3_b36_fwd.qc.poly.tar.vcf O=all_samples_hapmap3_r3_b36_fwd.qc.poly.tar.picard.hg38.vcf C=../../../tools/liftover/chain_files/hg18ToHg38.over.chain REJECT=all_samples_hapmap3_r3_b36_fwd.qc.poly.tar.picard.hg38.reject.vcf R=../../../data/assemblies/hg38/hg38.fa VERBOSITY=ERROR

Here is the run:
[Thu Aug 13 00:43:40 CEST 2015] picard.vcf.LiftoverVcf INPUT=all_samples_hapmap3_r3_b36_fwd.qc.poly.tar.vcf OUTPUT=all_samples_hapmap3_r3_b36_fwd.qc.poly.tar.picard.hg38.vcf CHAIN=......\tools\liftover\chain_files\hg18ToHg38.over.chain REJECT=all_samples_hapmap3_r3_b36_fwd.qc.poly.tar.picard.hg38.reject.vcf REFERENCE_SEQUENCE=......\data\assemblies\hg19\assemble\hg38.fa VERBOSITY=ERROR QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json

Here is the error:
Exception in thread "main" java.lang.IllegalStateException: Allele in genotype A* not in the variant context [T*, C]
at htsjdk.variant.variantcontext.VariantContext.validateGenotypes(VariantContext.java:1357)
at htsjdk.variant.variantcontext.VariantContext.validate(VariantContext.java:1295)
at htsjdk.variant.variantcontext.VariantContext.(VariantContext.java:410)
at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:496)
at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:490)
at picard.vcf.LiftoverVcf.doWork(LiftoverVcf.java:200)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:206)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

  1. I have no idea which SNP is the problem.
  2. I do not know what T* means (does not seem to exist in the file).
  3. I am new to picard so I thought VERBOSE=ERROR will give me something more but nothing more appeared.
  4. Given that lifting hapmap.ncbi36 to hg19 then to hg38 produced the same erroneous reference allele I suppose lifting will not fix this and I will have to work with dnsnp to correct my file. Do you know how I can change reference allele in a vcf? Is there a tool for this? Is there a liftover tool for dbsnp?
  5. As a side note I want to make picard work because I read that you will be deprecating the GATK liftover and will support the picard liftover (at some point in the future) so help with this tool will be appreciated.

The result of Mutect BAM and vcf is different.

$
0
0

I got some vcf result using mutect. but I have some question about the result.

  1. the Allele frequency in vcf is so strange.
    for example, the below is my result.
    chr4 1809127 . C T . clustered_events;panel_of_normals;triallelic_site ECNT=2;HCNT=2;MAX_ED=17;MIN_ED=17;NLOD=0.00;TLOD=24.52 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/1:109,29:0.078:0:0:.:270,103:0:0

the number of ref, alt genotype is 109, 29, but why alllele frequency is 0.078 ?
is it correct that AF 0.21 ?? (29 / (109+29))
I don't understand.

  1. And, the base count of Mutect bamout is different with mutect vcf result.
    below is vcf result,
    chr12 93966398 . A C . PASS ECNT=1;HCNT=20;MAX_ED=.;MIN_ED=.;NLOD=0.00;TLOD=12.85 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/1:90,32:0.256:0:0:.:1740,509:0:0

below is basecount (gatk) result of mutect bamout file.

chr12:93966398 186 93.00 58 A:38 C:20 G:0 T:0 N:0 128 A:96 C:32 G:0 T:0 N:0

vcf result indicate ref(A) =90 and alt(C)=32, but mutect bamout file indicate different basecount (A=96, C=32).
why is the number of basecount difference between vcf and mutect bam?

please answer my question.

thanks.

yh

ps. I used mutect2 and gatk3.6(DepthOfCoverage)

Problem with LiftoverVcf

$
0
0

It is my first time running the LiftoverVcf, but I saw that many other users passed through difficulties similar to mine but not exactly the same. I'm trying to convert a vcf file from Hg18 to Hh19.
COMMAND:

java -jar picard.jar LiftoverVcf I=input.vcf O=out.chr21.vcf CHAIN=hg18ToHg19.over.chain REJECT=rejected_variants.chr21.vfc R=hg19.fasta

It seems that the inputs are ok and that there is also no problem with the vcf nor with the reference.

This is the ERROR message:

INFO 2017-10-26 23:33:39 LiftoverVcf Loading up the target reference genome.
[Thu Oct 26 23:33:50 BRST 2017] picard.vcf.LiftoverVcf done. Elapsed time: 0.19 minutes.
Runtime.totalMemory()=2941779968
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at htsjdk.samtools.reference.FastaSequenceFile.readSequence(FastaSequenceFile.java:133)
at htsjdk.samtools.reference.FastaSequenceFile.nextSequence(FastaSequenceFile.java:83)
at htsjdk.samtools.reference.ReferenceSequenceFileWalker.get(ReferenceSequenceFileWalker.java:93)
at picard.vcf.LiftoverVcf.doWork(LiftoverVcf.java:188)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)

Any idea how to solve it?

Use vcf from HaplotypeCaller as normal_panel with Mutect2 (GAKT4.beta.6) ?

$
0
0

Hi,

I am trying to find somatic mutations in blood samples. The same samples were used previously to detect germline variants with HaplotypeCaller. Does it make sense to use the vcf obtained for a given sample from HaplotypeCaller as a --normal_panel parameter with Mutect2 in order to detect only somatic variants in that sample ? Or should I use another parameter to pass the germline variant list on ?

Many thanks for your answer,

Olivier

Error in SortVcf

$
0
0

I have been going through this problem. Is this very common and is there any solution to this error.

Exception in thread "main" java.lang.IllegalStateException: Key . found in VariantContext field INFO at 10:153837 but this key isn't defined in the VCFHeader. We require all VCFs to have complete VCF headers by default.
at htsjdk.variant.vcf.VCFEncoder.fieldIsMissingFromHeaderError(VCFEncoder.java:173)
at htsjdk.variant.vcf.VCFEncoder.encode(VCFEncoder.java:112)
at htsjdk.variant.variantcontext.writer.VCFWriter.add(VCFWriter.java:224)
at picard.vcf.SortVcf.writeSortedOutput(SortVcf.java:187)
at picard.vcf.SortVcf.doWork(SortVcf.java:105)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)

Viewing all 624 articles
Browse latest View live