Format commands

The format module allows users to convert allele data between different formats. It supports a variety of input and output formats, making it easy to integrate with other tools and workflows.

to_vcf

It will append the genotyped alleles to a existing vcf file.

altools format to_vcf resources/hla_diversity.txt \
    --loci_file resources/gene_locations.tsv \
    --vcf file_to_append_to.vcf
Input file

Genotype file is tab-separated, where the first column is the sample name and pairs of columns for each gene. The header gene name convention is “gene” + “gene.1”. e.g.:

"id" "sbgroup" "A" "A.1"
"sample1" "CEPH" "03:01" "02:01"
–loci_file:

Additionally, a list of gene locations is required. The file should be tab-separated with the following format:

gene    start
HFE    6:26087441
HLA-A    6:29942554

The first column is the gene name and the second column is (chromosome):(position). This position data can be found in ensembl or UCSC. The sample file used in this repo was obtained from a post in IPD-IMGT.

Output file

This file must already exist! because alleles from the input file will be appended to it. The file must contain the header of a VCF file and shouldn’t be compressed. The header should contain the gene names in the same format as the –loci_file:. Other alleles can already be present in the file, it won’t affect the output.

You need ensure that the file contains ONLY the samples in the genotype file and no more. Otherwise the concatenated alleles won’t match the header. To filter the samples you can use bcftools:

cut -d' ' -f1 resources/hla_diversity.txt | tail -n +2 | tr -d '"' | uniq > samples_id.txt
bcftools view --force-samples -S samples_id.txt test.vcf > filtered.vcf

from_vcf

Converts from vcf to a genotype table. This command is useful to convert the results of HLA imputation tools (e.g., SNP2HLA, HLA-TAPAS) to a genotype table that can be used in other tools.

altools format from_vcf only_hla.vcf --phe input.phe --out output.pyhla
Input

A vcf The vcf file should be filtered to contain only the HLA genes. You can use bcftools to do this.

bcftools view --include 'ID~"HLA"' raw_imputed.vcf > only_hla.vcf
–phe:

Additionally, a phenotype file can be provided. This file should follow the .phe format of plink files. The first column is the sample ID and the second column is the phenotype (0/1 for case/control). The sample IDs must match those in the vcf file.

Output

A .pyhla file compatible with pyHLA and PyPop will be generated.

allele_resolution

Truncate allele resolutions to a uniform level. This is specially useful prior to association analyses. Below are some examples of how alleles are normalized at different resolutions.

- resolution 1:
    - 01:01 -> 01
    - 02:03 -> 02
- resolution 2:
    - 01:01 -> 01:01
    - 02:03:01 -> 02:03
- resolution 3:
    - 01:01 -> NA
    - 02:03:01:01 -> 02:03:01
altools format allele_resolution input.alt --resolution 3
Input

An allele table (.alt).

Output

An allele table (.alt) with normalized allele resolutions.

from_ikmb

Generates a consensus HLA genotype from the result of many HLA genotyping algorithms. It is tailored to process reports from the ikmb HLA genotyping pipeline.

altools format from_ikmb --input "IKMB_Reports/*.json" \
    --output "output.txt" \
    --format pyhla

Note

For more information about the consensus algorithm, see Consensus from multiple genotypes.

Input

Multiple json files follow the format of reports generated by the ikmb HLA genotyping pipeline. These report files should be in a folder called IKMB_Reports/ in a json format.

Output

An allele table (.alt) file will be generated containing the consensus genotype for each sample. The default output name is output.alt.

from_ukb

Converts the imputed HLA data from the UK Biobank to allele table (.alt).

altools format from_ukb --phenotype file.pheno input.alt
Input

The input file is a tab-separated file containing the imputed HLA data from the UK Biobank.

–phenotype:

Additionally, a phenotype file can be provided. This file should follow the .phe format of plink files. The first column is the sample ID and the second column is the phenotype (0/1 for case/control). The sample IDs must match those in the input file.

Output

An allele table (.alt) file will be generated containing the genotype for each sample. The default output name is output.alt.

from_kirmapper

Converts the KIR genotyping results from KIR-mapper to allele table (.alt).

altools format from_kirmapper kir_results.tsv --output output.alt
Input

glob pattern to the report files generated by KIR-mapper.

–phenotype:

Additionally, a phenotype file can be provided. This file should follow the .phe format of plink files. The first column is the sample ID and the second column is the phenotype (0/1 for case/control). The sample IDs must match those in the input file.

–remove_pheno_zero:

If this flag is set, samples with phenotype 0 will be removed from the output file.

Output

An allele table (.alt) file will be generated containing the genotype for each sample. The default output name is output.alt.

from_immuanot

Converts the HLA genotyping results from Immuannot to allele table (.alt).

altools format from_immuanot immuanot_results.tsv --output output.alt
Input

The report file generated by Immuannot.

Output

An allele table (.alt) file will be generated containing the genotype for each sample. The default output name is output.alt.

hla_group

Group HLA alleles by either p-group or g-group.

altools format hla_group input.alt --group pgroup --output output.alt
Input

An allele table (.alt) file containing HLA alleles.

Output

An allele table (.alt) file with grouped HLA alleles.