alleleTools.format.vcf2allele module
VCF to Allele Table Conversion Module.
This module converts VCF (Variant Call Format) files containing HLA or KIR allele data into allele tables. It supports various output formats including pyHLA and PyPop compatible formats.
- Usage:
To generate the input file from imputation, run: # Extract only relevant alleles bcftools view –include ‘ID~”HLA”’ IMPUTED.vcf > HLA.vcf # Convert the extracted alleles to a table altools convert vcf2allele HLA.vcf –output out.alt
Author: Nicolás Mendoza Mejía (2023)
- class alleleTools.format.vcf2allele.VCFalleles(alleles: DataFrame, formats: DataFrame)[source]
Bases:
objectClass for processing and analyzing VCF allele data.
This class handles the parsing and analysis of allele information from VCF format data, including genotype determination, resolution analysis, and ploidy-based filtering.
- df
Processed allele data with genotype information indexed by gene and allele names.
- Type:
pd.DataFrame
- Parameters:
alleles (pd.DataFrame) – Raw allele data from VCF
formats (pd.DataFrame) – Format information from VCF header
- sort_and_fill(extensive=False)[source]
Sort alleles by gene and determine final genotype calls.
For each gene, determines whether the genotype is homozygous or heterozygous and selects the appropriate alleles based on confidence scores and resolution.
- Parameters:
extensive (bool) – Whether to perform extensive search for low-confidence alleles (default: False)
- Returns:
List of selected allele names for all genes
- Return type:
List[str]
- alleleTools.format.vcf2allele.call_function(args)[source]
Main function to execute VCF to allele table conversion.
This function orchestrates the conversion process by: 1. Loading and preprocessing the VCF file 2. Extracting genotype information 3. Converting to allele format 4. Adding phenotype data if provided 5. Writing the output file
- Parameters:
args – Parsed command line arguments containing: - input: Path to input VCF file - output: Path to output file - rm_prefix: Prefix to remove from allele names - separator: Separator between gene and allele names - extensive_search: Whether to perform extensive allele search - phe: Optional phenotype file path - output_header: Whether to include header in output - population: Population identifier for PyPop compatibility