Abstrakt: | Genomic annotation encodes significant parts of the genome as sets of intervals. Significant overlap of these intervals, can suggest biological connections. This thesis builds
upon a previous method that uses two-state Markov chains to compute p-values for
these overlaps. However, two-state Markov chain generates geometric a distribution of
interval and gap lengths. By employing discrete phase-type distributions, which can
approximate arbitrary discrete distributions, we mitigate these limitations. We evaluate various Markov chain architectures for performance and expressive power. We
develop a software tool to fit these distributions to real genomic data. Finally, we
incorporate absorbing Markov chains into the original tool for computing the p-values
for overlapping genome annotations
|
---|