bleties

IES retention analysis from PacBio CCS reads

View the Project on GitHub Swart-lab/bleties

MILTEL - Method of Long-read TELomere detection

During development of MAC genomes in ciliates, chromosome breakage and addition of telomeric repeats to the newly formed ends occurs. The extent of such fragmentation varies between species, and in some species is known to be regulated by conserved chromosome breakage sequences close to the breakage sites.

Input data

Identification of telomere sequences from soft-clipped sequences

When a telomere-bearing sequence of MAC origin is mapped onto a MIC reference sequence, the telomeric part does not align and is usually soft-clipped. Soft clipping operations are limited to the ends of a read, i.e. there can be no more than two soft clips on a single read.

MILTEL considers each mapped read with soft clips, and extracts the clipped segment of the query sequence, the coordinates of the clip with respect to the reference, and whether the unmapped sequence is to the left (5’) or right (3’) of the reference coordinate.

Each clipped segment extracted above is searched for the user-provided telomeric repeat sequence using NCRF, which can find tandem repeats in the presence of noise from sequencing error. Where a telomeric repeat (above a minimum length) is found, the gap distance from the beginning/end of the telomeric repeat to the clipping junction is also counted (in bp), as well as whether the telomere sequence is reverse complemented.

Output

MILTEL produces the following output files, where {OUT} is the output prefix supplied to the --out option:

The GFF3 file for the alternative telomere addition sites has MILTEL in the source column (column 2). Coordinates where clipped sequence segments contain telomeres are called putative chromosome_breakage_site in the type column (column 3). Because this is a feature of zero length, the start and end fields (columns 4 and 5) are equal, and the junction is to the right of the coordinate, following GFF convention.

The score (column 6) reports the breakage score, which is the number of telomere-bearing reads clipped at that specific coordinate, divided by the total read coverage at that coordinate (as reported in the average_coverage attribute, described below).

The attributes (column 9) contain the following fields:

With the --dump option, internal data are dumped in JSON format for troubleshooting to: {OUT}.miltel.dump.json.