Till sidans topp

Sidansvarig: Webbredaktion
Sidan uppdaterades: 2012-09-11 15:12

Tipsa en vän
Utskriftsversion

Modeling of bacterial DNA… - Göteborgs universitet Till startsida
Webbkarta
Till innehåll Läs mer om hur kakor används på gu.se

Modeling of bacterial DNA patterns important in horizontal gene transfer using stochastic grammars

Licentiatsavhandling
Författare Mariana Buongermino Pereira
Datum för examination 2015-06-17
Opponent at public defense Magnus Alm Rosenblad
Förlag Chalmers University of Technology
Förlagsort Göteborg
Publiceringsår 2015
Publicerad vid Institutionen för matematiska vetenskaper, matematisk statistik
Språk en
Ämnesord Stochastic context-free grammars, hidden Markov model, conditional random fields, integrons, <i>attC</i> sites, secondary structure.
Ämneskategorier Tillämpad matematik, Matematisk statistik, Bioinformatik (beräkningsbiologi), Biostatistik, Datorlingvistik

Sammanfattning

DNA contains genes which carry the blueprints for all processes necessary to maintain life. In addition to genes, DNA also contains a wide range of functional patterns, which governs many of these processes. These functional patterns have typically a high variability, both within and between species, which makes them hard to detect. Stochastic models, such as hidden Markov models and conditional random fields, offer flexible frameworks that can be used to describe these patterns, their variability and dependencies. In this thesis, we describe two such models for identification of attC sites, patterns necessary for the sharing of genes between bacteria, in a process known as horizontal gene transfer. Acquired genes causing bacteria to become resistant to antibiotics are often associated with attC sites, which make their identification highly relevant.

In the first paper we develop a stochastic regular grammar defined by an eight-state generalized hidden Markov model that describes the sequence conservation and length distribution of the different parts of an attC site. The different model assumptions were evaluated and improved using cross-validation experiments, which resulted in a high sensitivity in detecting attC sites. The model was applied to a real dataset in the form of a well-studied plasmid and was able to find the majority of the present attC sites. In addition, six metagenomic samples from polluted and pristine environments were analysed. The model predicted a 15-fold higher abundance of attC sites in the polluted environments compared to the pristine ones. The model implementation, HattCI, was done in R and is freely available at http://bioinformatics.math.chalmers.se/HattCI.

AttC sites fold into a three-dimensional structure that is crucial for the horizontal transfer of genes. In the second paper, we extend our previous model to include specific information about this folding. We develop a stochastic context-free grammar, which is suited to describe the nested dependencies induced by the structure. The grammar includes features that describe thermodynamic properties of the folding. The model is formulated in the framework of conditional random fields, with parameter estimation done numerically using structured support vector machines. A first implementation of the model has been completed; further experiments, such as evaluation of the performance using cross-validation is planned.

This thesis demonstrates the flexibility of stochastic grammars for modelling the variability and dependencies in DNA patterns. It also emphasizes the value of the use of stochastic methods in the field of microbiology and infectious diseases.

Sidansvarig: Webbredaktion|Sidan uppdaterades: 2012-09-11
Dela:

På Göteborgs universitet använder vi kakor (cookies) för att webbplatsen ska fungera på ett bra sätt för dig. Genom att surfa vidare godkänner du att vi använder kakor.  Vad är kakor?