Abstrakt: | The goal of this master thesis is to computationally identify modified DNA bases from
raw MinION data.
The MinION is a portable DNA sequencing device, which does not require DNA
amplification in the sample preparation step. Consequently, DNA modifications are
still present in the DNA strand, which is sequenced by passing through the nanopore.
Modified bases cause shifts in the measured signal which can later be identified com-
putationally.
Current tools for the identification of modified bases from MinION data require
a labeled training set which is composed of modified and non-modified (canonical)
bases. It is quite difficult and expensive to experimentally create this kind of dataset.
In this thesis, we use a semi-supervised approach to this problem instead. We train
an autoencoder on a dataset without modifications to learn characteristics of the non-
modified bases. Then we analyze the reconstruction error of the autoencoder to identify
bases that do not conform to the learnt characterization.
In our work, we have focused on DNA methylation but our approach can be used
for the detection of any DNA modification. Our results show that from the recon-
struction error of the autoencoders, we cannot differentiate between methylated and
unmethylated DNA bases only by using a single read. However, when we aggregate
reconstruction errors from multiple reads, we get a more promising result: for most
of the methylations, ten reads are enough to differentiate between methylated and
unmethylated samples.
|
---|