Quantcast
Channel: vcf — GATK-Forum
Viewing all articles
Browse latest Browse all 624

Difference in GenotypeGVCFs generated VCF after consolidation with GenomicsDBimport and CombineGVCF

$
0
0

Hi,

I had a set of total 81 GVCFs that I first consolidated using GenomicsDBimport and then using CombineGVCF and then GenotypeGVCF was run in both cases. For GenomicsDBimport, I ran the command per contig and then I ran GenotypeGVCF on each database to get the final VCF file. Then I used Picard GatherGVCF to make the final VCF. The commands I used are wriiten below:

Using GenomicsDBimport:
java -Xmx90g -jar gatk-package-4.0.4.0-local.jar GenomicsDBImport -R water_buffalo_re_arranged_chrom_ref_genome.fa --TMP_DIR ./tmp --sample-name-map sample_names_map_new.txt --reader-threads 2 --genomicsdb-workspace-path "$contig" -L "$contig"

java -Xmx8G -XX:ConcGCThreads=1 -jar gatk-package-4.0.4.0-local.jar GenotypeGVCFs -R /water_buffalo_re_arranged_chrom_ref_genome.fa -new-qual -V gendb://"$contig" -O "$contig"_variants.vcf.gz

java -jar picard.jar GatherVcfs INPUT=list.txt OUTPUT=Final_med_buffalo_variants_81_samples.vcf.gz

Using CombineGVCF:
java -Xmx200g -XX:ConcGCThreads=1 -jar gatk-package-4.0.4.0-local.jar CombineGVCFs -R water_buffalo_re_arranged_chrom_ref_genome.fa --variant All_gvcf_gz.list -O combined_81.g.vcf.gz

java -Xmx8G -XX:ConcGCThreads=1 -jar gatk-package-4.0.4.0-local.jar GenotypeGVCFs -R water_buffalo_re_arranged_chrom_ref_genome.fa -new-qual -V combined_81.g.vcf.gz -O Final_variants_81_samples_using_CombineGVCF.vcf.gz

The final VCF in both the cases should be the same. Unfortunately, it was not. On running bcftools isec, I found that some variants were common to one VCF and some were in other. What could be the reason behind this discrepancy?

Kindly let me know if you need more information.


Viewing all articles
Browse latest Browse all 624