Generating a vcf with the information of specific genome positions (hotspots)
Hello, I'm developing a pipeline that needs to take into account the information about variants that are present on a list of hotspots on the genome, because my final analysis uses the information...
View ArticleMerging population vcf files without gvcf
Hi Everyone, I have two separate raw VCFs dataset processed by GATK version 3.5 (one from the population of ~ 2600 and one from the population of ~160). Since the upstream data cleaning and processing...
View ArticleHaplotypeCaller Incompatible Contigs DNASeq
I'm using GATK 4.0.11 and I'm getting the following error message when I run HaplotypeCaller on DNAseq data: 10:19:17.089 INFO HaplotypeCaller -...
View ArticleMutect on mm10
Hello, I am trying to run mutect on mouse, and getting the following error ERROR MESSAGE: Unable to parse header with error: Your input file has a malformed header: VCFv4.2 is not a supported version,...
View ArticleAnalysis Pipeline Discrepancy in SNP Calling and Coverage
Hi, All, So I am new to GATK so please bear with me... Essentially, I have developed a unix script to analyze the fastq sequencing output for a novel targeting technique. I am only targeting 27 SNPs...
View ArticleRemoving "chr" from CHROM field
Hello! I intend to use a training resource VCF that contains "chr" in the CHR field (Reference obtained from UCSC), which is incompatible with my raw call set (reference obtained from ensembl). I check...
View ArticleSelectVariants Starts Traversal but Does not Progress, High CPU Usage
Hi, I am using the GATK tool SelectVariants to only select variants that have passed FilterMutectCalls. Both FilterMutectCalls and Mutect2 were run in multi-sample mode, so the VCF being input to...
View ArticleGATK v4.1.0.0 ValidateVariants, gVCF mode, error; non in v4.0.11.0
GATK v4.0.11.0 & v4.1.0.0, linux server, bash Hi, I was running the following codes ${GATK4} --java-options '-Xmx10g -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -XX:ConcGCThreads=1...
View ArticleVCF generation for somatic SNVs without "Normal Sample"
Hi there, I am trying to follow the GATK tutorial for somatic mutations for GATK4, but my data does not quite match what the example is doing. Article name: (How to) Call somatic mutations using GATK4...
View ArticleHow to merge the sample_X_genotyped_intervals.vcf files created by...
How to merge the sample_X_genotyped_intervals.vcf files created by PostprocessGermlineCNVCalls to a multi-sample VCF file? The files all have the same bins/records, so it should be easy to created a...
View ArticleGATK4.1.0.0,HalotypeCaller VCF have 0/1 and 0|1 genotype。How to distinguish...
In the previous version(GATK-4.0.3.0), there were only 0/1 genotypes, no 0|1. In the latest version(GATK-4.1.0.0), there were 0/1 and 0|1 genotypes. What is the difference between "/" and "|", why is...
View Articleasterisc in some lines of my vcf file
Dear all, I ran the haplotype caller, in order to find germline variants in my samples (808 samples). But in the ALT column I found "*" in some lines, and I dont know what does it mean.... (I follow...
View ArticleOncotator for build hg38
The current version of Oncotator on the Broad servers is v1.9.0.0 and indicates that only hg19 is supported, are there plans to extend the tool to hg38? If not are there other recommended tools to...
View ArticleAnnotation problem: not all variants are taken into account
Hello, I use GATK version 4.1 to annotate a vcf with the following command : java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true...
View ArticleERROR MESSAGE: Your input file has a malformed header
Hello, I want to annotate my file through gatk 3.8 but I get this error: MESSAGE: Your input file has a malformed header: there are not enough columns present in the header line: #CHROM POS ID REF ALT...
View ArticleUnable to merge gvcf files
I have used the command to merge gvcfs file using GATK java -jar /opt/apps/gatk/3.7/GenomeAnalysisTK.jar -T CombineGVCFs -R...
View ArticleMapping a locus to a chromosome in a genome
Hello I am currently working on a maize mutant and wild type data. I have the DNA-Seq data for these samples and I am currently using the GATK pipeline to analyze the data. I am trying to map the...
View ArticleTransform VCF file changing only variant ids
Hello, I want to read a VCF file and write out another VCF file that is equivalent to the first one except that variant ids have been changed. How would I do that? A VCFFileReader gives me an iterator...
View ArticleCorrect GATK4 tools to use for combining scattered gVCFS and VCFs from...
I am running GATK on non-human data and am trying to follow the best practices as much as possible. I've now hit two separate roadblocks, both addressing similar issues: 1) Combining scattered gVCFS...
View ArticleHow to set GVCF genotypes too ./. based on the GQ score
Hi, I have a reasonably large non-human multi-VCF dataset containing ~280 samples and ~70M variants. I want to filter low quality genotype calls (but not variants as a whole). This does not seem to be...
View ArticleSpurious insertions being called by HaplotypeCaller?
I just called variants from the same bwa-generated bam file using 1) samtools/bcftools and 2) HaplotypeCaller. Downstream analysis indicated that a particular locus looked interesting, but only in the...
View ArticleGraphical (GUI) and interactive exploration tool for large genotype matrixes...
Dear GATK development team and GATK users, What is currently the best visual(GUI) and interactive genotype matrix exploration tool (a browser) for large genotype matrixes, say the 1000 human genomes...
View ArticleGATK4,Cann't get right CalculateContamination result
Question regarding CalculateContamination(GATK/4.1.2.0): With CalculateContamination in tumor matched mode, I get: contamination error NaN 1.0 When I look at the tumor.table and normal. table files...
View ArticleUse mutect2 or UnifiedGenotyper?
Hi, I am currently working on finding the somatic mutations in the tumor (using the sing cell sequencing result). And my final goal is to use these somatic mutations from different regions to build...
View ArticleAD allele depth interpretation
Hello, I have a query on the interpretation of the AD variable in a vcf generated by calling about 800 samples together. The header defines it as: ##FORMAT= and the forum further elaborates: AD is the...
View ArticleVariant filtration by allelic balance bias
Hi everyone, I'm trying to find a way to filter some heterozygous genotypes that might have been misassigned due to PCR or sequencing errors and result in a very unrealistic allelic balance bias like...
View ArticleSelectVariants - java.lang.IllegalStateException: Allele in genotype not in...
Hi everyone, I'm trying to select variants with SelectVariants but for some reason it stops saying that Allele in genotype CT* not in the variant context [CT*, C]. I tryied to find a CT* in the VCF...
View Articleunderstand HaplotypeCaller output vcf format
Hi there, I am using GATK4.1.0.0 version on germline pair-end illumina WGS data with following command: ``` gatk4.1.0.0 --java-options '-Xmx5G' HaplotypeCaller -R...
View ArticleProblem with a BED file and the flag -alleles (HaplotypeCaller)
I'm trying to pass the flag -gt_mode GENOTYPE_GIVEN_ALLELES by giving a list of SNP in a BED file. This BED file has 4 columns (chromosome, initial position, final position, allele, allele). However...
View ArticleSelectVariants by sample names file
I need to subset a list of samples from a large vcf.gz file. The sample names was saved in a plain txt file, each name in a row. I used -RF -sf my.sample.names.txt but kept getting error. Any...
View ArticleGATK4: RMSMappingQuality results differ between v4.0.0.0 and v4.1.1.0
Good morning everybody and thanks in advance for your advices and your help. I checked for this problem before submitting this question. I hope this is not a double. We are working with whole genome...
View Articlewhy variant callers's (GATK3.8 and GATK 4.0) results are different ?
hello, i am beginner . i used two different tools to analyze my data but i got the two different why ?
View ArticleGenimicDBImport too slow!!!
Dear all, I'm runnig GenomicDBImport for 30 samples. It takes soo much time and after 3 days job killed for walltime exceeded limit. I want to ask you If there is a way to let it become faster. I...
View ArticleThe bamout file results are inconsistent with the VCF file results
Hi, I use GATK4 were analyzed, and found that took place on a site of "bamout" file multiple mutations, respectively from G mutation is T, the number of reads supported mutation is 14, and from G...
View ArticleHaplotypeCaller output modes EMIT_ALL_CONFIDENT_SITES and EMIT_ALL_SITES not...
Dear GATK-Team, First of all, thank you for your great support and constant development of GATK! I was very pleased to see that the output mode options EMIT_ALL_CONFIDENT_SITES and EMIT_ALL_SITES were...
View ArticleUsing GATK SelectVariants to filter based on calculated allele frequency
Many of the variant callers I use, such as Pindel, do not include the AF or allele frequency value in the vcf output. However I still need to filter the vcf based on the allele frequencies of the Tumor...
View ArticleHow to select samples that are polymorphic on a specific locus from a joint...
Hi, I am trying to select the samples that are polymorphic on a specific locus from a joint genotyped vcf file using SelectVariants tool and JEXL expressions with no success. The command I am trying to...
View ArticleNot understand the value in VCF file
Hi, I am sure if is right to ask the question here. I got the vcf file and need some help to understand the meaning. In the last two columns, there are some rows like: GT:CNADJ 0|1:2, I know 0|1...
View ArticleHow to get a smaller list of deNovo SNPs between 3 genotype
Hello. I am currently working on maize whole genome dataset and I have 3 samples- WT, MT and B73. I obtained the VCF files for all 3 datasets using the haplotype caller. However, the list of SNPs that...
View ArticleASEReadCounter not accepting VCF file as input
I'm trying to run ASEReadCounter, but it's not accepting a VCF file as input. I'm getting the following error: ##### ERROR MESSAGE: Invalid command line: No tribble type was provided on the command...
View ArticleMutect2 - java.lang.IllegalArgumentException: Cannot construct fragment from...
Hi, trying the latest version of Mutect2 4.1.4.0 java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false...
View Articlenon-reference allele didn't be called into vcf by HC
Hi GATK team I ran HC joint calling and found out that some non-reference alleles didn't appear in the vcf. Here are how these sites looks like: Most of these allele have VAF =1 and reside on the...
View ArticleHow to diagnose missing MQRankSum annotations (when BaseQRankSum is available)
We wish to discover short variants in a cohort of 60 plant whole-genome-samples. We're blocked on VariantRecalibrator. We have a VCF truth set (aka resource) of SNPs which has been computed beforehand...
View ArticleExtracting MQ and QUAL values for invariant sites in VCF files
I'm having problems getting mapping quality (MQ) values and PHRED called site quality scores (QUAL) for invariant sites in the VCF files generated by GATK, even when I specify that all sites should be...
View ArticleHow to identify duplicated genes in VCF file obtained after GATK pipeline?
I am working to find which gene type is more duplicated. I had mapped and annotated my VCF file by GATK pipeline. Please guide me how to proceed now.
View ArticleGT and AD
if my vcf indicates the GT is 1/1 and the AD=14,4: what does the 14,4 indicate? 14 reads of the ALT and 4 that were not???? or something else thank you
View ArticleWhere can I find dbsnp_144.hg38.vcf.gz
I'm installing an application that uses files from the GATK resource bundle. I found all of the needed files at...
View ArticleAre there issues with using reads coming from different technologies and...
Hello! We are analyzing a WGS data of 60 samples (6 groups, 10 samples/group) produced by HiSeq4000. The mean coverage per sample is 25x (lowest sample is 15x). Now we realized we need to sequence more...
View ArticleHow to add sample names in VCF?
I am using GATK best practices for germline SNPs and Indels 4.1.2.0. After mapping and recalibration, I run haplotypecaller in GVCF mode. I am combining all vcf files (output from haplotypecaller)...
View ArticleMissing PS field in the VCF file produced by GenotypeGVCFs
Hello, I followed GATK best practices to produce a VCF file for 20 individuals. GATK version is 4.1.0.0. The BAM files were all verified by ValidateSamFile, no errors or warnings were detected....
View Article