alleleTools.allele module
- class alleleTools.allele.Allele(gene: str, fields: List[str], confidence: str = '', gene_delimiter: str = '*', field_delimiter: str = ':', suffix: str = '')[source]
Bases:
objectClass representing an HLA allele with parsing and comparison capabilities.
This class handles the parsing of HLA allele nomenclature and provides methods for comparison, resolution management, and string representation. It supports confidence scores from algorithms like HiSat.
- gene
Gene name (e.g., ‘A’, ‘B’, ‘DRB1’)
- Type:
str
- fields
List of allele fields (e.g., [‘01’, ‘01’, ‘01’])
- Type:
List[str]
- confidence
Optional confidence score from genotyping algorithm
- Type:
float
- Parameters:
code (str | list) – Allele string to parse (e.g., ‘A*01:01:01’ or ‘A*01:01(0.95)’). Alternatively, you can pass a list of fields along with the argument gene
gene (str) – If the gene name is not present in code you should pass the here here.
- Raises:
Exception – If the allele string cannot be parsed
Example
>>> allele = Allele('A*01:01:01') >>> print(allele.gene) # 'A' >>> print(allele.fields) # ['01', '01', '01'] >>> print(len(allele)) # 3
- compare(allele: Allele) AlleleMatchStatus[source]
Compare this allele with another allele.
Performs detailed comparison considering gene name, field values, and resolution levels.
- Parameters:
allele (Allele) – The allele to compare with
- Returns:
The result of the comparison
- Return type:
CmpResult
Example
>>> a1 = Allele('A*01:01') >>> a2 = Allele('A*01:01:01') >>> a1.compare(a2) # CmpResult.MORE_RESOLUTION
- class alleleTools.allele.AlleleMatchStatus(value)[source]
Bases:
EnumEnumeration for allele comparison results.
Defines the possible outcomes when comparing two alleles: - NOT_EQUAL: Alleles are different (different genes or field values) - EQUAL: Alleles are identical in gene and all fields - LESS_RESOLUTION: Current allele has fewer fields than compared allele - MORE_RESOLUTION: Current allele has more fields than compared allele
- EQUAL = 2
- LESS_RESOLUTION = 3
- MORE_RESOLUTION = 4
- NOT_EQUAL = 1
- class alleleTools.allele.AlleleParser(gene_family: str, config_file: str = '')[source]
Bases:
objectConfigurable allele parser that supports multiple parsing strategies. This class loads parsing configurations from a JSON file.
The configuration can be overwritten by providing a custom config file during initialization (config_file parameter).
- Use:
parser = AlleleParser(gene_family=”hla”, config_file=”custom_config.json”) allele = parser.parse(“A*01:02:03”)
- class alleleTools.allele.DelimitedParser(gene_delimiter: str, field_delimiter: str)[source]
Bases:
ParsingStrategy- Parser for alleles using simple delimiters. Two parameters are needed:
gene_delimiter: character that separates the gene name from the fields
field_delimiter: character that separates the different fields
Example
For allele “A*01:02:03”, gene_delimiter=”*”, field_delimiter=”:”
- Results:
gene = “A”
fields = [“01”, “02”, “03”]
- class alleleTools.allele.FieldTree(name: str)[source]
Bases:
objectTree structure to represent the hierarchical organization of allele fields.
Each node in the tree corresponds to a field value at a particular position in the allele nomenclature. The tree is used to count and organize the occurrence of each field value across a set of alleles.
- field
The field value at this node.
- Type:
str
- support
Number of times this field value has been added at this position, which indicates how many times it has been genotyped by different tools.
- Type:
int
- representing subsequent fields.
Example
For alleles A*01:01 and A*01:02, the tree will have a root ‘A’, a child ‘01’, and two children ‘01’ and ‘02’ under it.
- add(fields: list, weight: float = 1.0)[source]
Add a sequence of fields to the tree, incrementing counts and creating nodes as needed.
- Parameters:
fields (list) – List of field values (str) to add as a path in the
tree.
Example
tree.add([‘01’, ‘01’]) will add/increment nodes for ‘01’ at two levels.
- add_batch(batch: List[list], weight: float = 1.0)[source]
Helper method to append a list of field lists to the current tree.
- Parameters:
batch (List[list]) – A list containing lists of field values (str).
- get_consensus(min_support: float, gene_delimiter: str = '*', field_delimiter: str = ':', max_support: float = 0) Tuple[List[str], List[float]][source]
Gets a list of up to two possible consensus solutions that meet the criteria of the minimum support. This is basically a tree search algorithm.
Two additional parameters can be provided to format the allele strings.
- Parameters:
min_support (float) – minimum proportion of support required. Each program contributes votes equally, the amount of times the allele has been genotyped is stored in each node of the tree. This value is used to filter consensus alleles based on their support values.
gene_delimiter (str) – Delimiter between gene name and fields.
field_delimiter (str) – Delimiter between fields.
- Returns:
- Consensus alleles and their support
values.
- Return type:
Tuple[List[str], List[float]]
- class alleleTools.allele.ParsingStrategy[source]
Bases:
ABCAbstract base class for allele parsing strategies.
- class alleleTools.allele.RegexParser(pattern: str, field_delimiter: str, gene_delimiter: str)[source]
Bases:
ParsingStrategyParser for alleles using regular expressions. It only requires the regex pattern. It should contain named groups for ‘gene’, ‘field1’, ‘field2’, etc. An optional group ‘confidence’ can also be included to capture confidence scores.
field_delimiter and gene_delimiter can also be specified. These parameters will only be used to format the the allele string representation.
This parser is more flexible and robust than the DelimitedParser, as it can handle more complex allele formats. However, it requires knowledge of regular expressions.
Example
For allele “A*01”, pattern = r”(?:(?P<gene>w+)*)?(?P<field1>d{2})”
- Results:
gene = “A”
fields = [“01”]