alleleTools.format.vcf module
VCF File Handling Module.
This module provides a VCF (Variant Call Format) class for reading, parsing, and manipulating VCF files containing genetic variant data, particularly optimized for HLA and KIR allele data.
- class alleleTools.format.vcf.VCF(path)[source]
Bases:
objectA class for handling VCF (Variant Call Format) files.
This class provides methods to read, parse, and manipulate VCF files, with specific functionality for handling allele data from polymorphic genes like HLA and KIR.
- metadata
VCF header metadata lines
- Type:
str
- dataframe
Main VCF data with ID as index
- Type:
pd.DataFrame
- Parameters:
path (str) – Path to the VCF file to read
- get_format()[source]
Extract the format field specification from the VCF.
Parses the FORMAT column to determine the structure of genotype information fields (e.g., GT:DS:AA:AB:BB).
- Returns:
List of format field names in order
- Return type:
List[str]
Example
>>> vcf.get_format() ['GT', 'DS', 'AA', 'AB', 'BB']
- remove_id_prefix(prefix: str)[source]
Remove a prefix from allele IDs in the dataframe.
This is commonly used to remove gene prefixes like ‘HLA_’ or ‘KIR_’ from allele identifiers to standardize naming.
- Parameters:
prefix (str) – The prefix string to remove from allele IDs
Example
>>> vcf.remove_id_prefix("HLA_") # "HLA_A*01:01" becomes "A*01:01"
- samples()[source]
Get the list of sample column names from the VCF.
Returns all column names that are not part of the standard VCF format (i.e., sample-specific genotype columns).
- Returns:
Set of sample column names
- Return type:
set