Page content

Tool development projects

These projects usually emerge from analyses that we have identified to be of general interest. BCF devotes time to implement workflows and tools, making them publicly available through our GitHub page. We are thankful to our collaborators who helped us to validate the predictions from the resulting tools.

Biomarker discovery with machine learning

BDC contributor(s): Björn Andersson, Peidi Liu and Erik Lorentzen. In collaboration with Britta Langen (UGoT)

Tumor radiotherapy and basic radiation research rely on the accurate relation between absorbed dose and the therapeutic/biological effect after irradiation. Mis-correlation of the dose-response can cause severe issues, such as under-treatment of cancers (and thus disease progression) or risk exposure of healthy tissue leading to secondary diseases. We are developing a machine learning tool based on omics data for biomarker discovery in radiation research.

Publications

(Poster) Radiotherapy biomarkers discovery using machine learning approaches (External link)

InVi: Integration & Visualization of genomic data

BDC contributor(s): Luciano Fernández and Marcela Dávila. In collaboration with Christina Jern (UGoT)

Advanced visualization of genomic data is vital to allow researchers explore and understand the complexities of their experimental data or large-scale datasets. Complex data visualization techniques exist today but their nature makes them difficult to use. To facilitate the exploration and creation of advanced genomics visualizations and support knowledge discovery, we developed the software InVi (Integration and Visualization of Genomic Data) and CiGUI (Circos Graphic User Interface) which rely on Circos for circular displays.

Dissemination

Ioniser: Assisting glycostructures annotation

BDC contributor(s): Dagmara Gotlib. In collaboration with the Proteomics Core Facility

The characterization of glycosylated proteins is a challenging task in the proteomics field as they are commonly presented by multiple glycoforms. The Ioniser assists in identifying potentially novel glycoforms, without the need for prior knowledge of the existing glycostructures for a given peptide. It processes and filters large amounts of mass-to-charge (m/z) ratio and abundance data allowing the user to identify additional glycosylated proteins based on user-specified parameters.

Dissemination

mitoChip-seq: Mitochondria-specific peak detection

BDC contributor(s): Sanna Abrahamsson and Marcela Dávila. In collaboration with Mara Domio and Sjöerd Wanrooij (Umeå University)

ChIP-Seq is a powerful method for identifying genome-wide DNA binding sites for transcription factors and other proteins. Standard software and pipelines are available for the analysis and interpretation of such data. However, the mitochondria genome has been neglected and most of these algorithms are not adequate to correctly processed proteins targeting this small circular chromosome. Here we present a simple workflow to automate some basic statistics and visualization aids with a focus on the mitochondrial genome.

Publications

Submitted

Odyssey 2.1.1: Imputation of genomic data

BDC contributor(s): Björn Andersson and Alina Orozco. In collaboration with Tara Stanne (UGoT)

Odyssey 2.1.0 is a semi-autonomous workflow designed for the preparation, phasing and imputation of genomic data. Odyssey 2.1.1 is modified to run directly from the data folder on a HPC system or designated file system that contains your data of interest which can be specified in the Setting.conf. Additionally, the option for imputation has been narrowed to using Minimac2 due to speed differences compared to Impute4. Other functionalities of Odyssey remain the same and can be reviewed in the modified documentation materials.

Dissemination

P-PSY-Finder: Detection of processed pseudogenes

BDC contributor(s): Sanna Abrahamsson and Marcela Dávila. In collaboration with Anna Rohlin (Laboratory Medicine at UGoT)

Processed pseudogenes (PΨgs) are disabled gene copies that are transcribed and may affect expression of paralogous genes. Moreover, their insertion in the genome can disrupt the structure or the regulatory region of a gene, affecting its transcription. These events have been identified as occurring mutations during cancer development, thus being able to identify processed pseudogenes and their location will improve the somatic mutation testing in the clinical setting. PΨFinder is a tool that can automatically predict novel PΨgs from DNA sequencing data and determine their location in the genome with high accuracy. It generates high quality figures and tables that aid in the interpretation of the results and guide the experimental validation. PΨFinder is a complementary analysis to any mutational screening in the identification of disease-causing mutations within cancer and other diseases.

Publications

REAPER: A light-weight file monitor

BDC contributor(s): Dagmara Gotlib. In collaboration with the Proteomics Core Facility

When performing mass spectrometry analyses, large amounts of data is produced. As the computer which performs this analysis has limited storage, it is of great interest to move the files to another storage as soon as possible. The Reaper monitors a specific directory where the files are created and updated, and with a user defined time unit checks for changes in that directory. When a file hasn’t changed in size by the third check, it is then copied to the appropriate location.

Dissemination

(Report) Gotlib, D. (2020) Reaper: A lightweight file monitor (External link)

TC-Hunter: Transgenic insertion sites detection

BDC contributor(s): Vanja Börjesson and Marcela Dávila. In collaboration with Jelena Milosevic (Karolinska Institutet)

Transgenic animal models are crucial for the study of gene function and disease, and are widely utilized in basic biological research, agriculture and pharma industries. Since the current methods for generating transgenic animals result in the random integration of the transgene under study, the phenotype may be compromised due to disruption of known genes or regulatory regions. We implemented TC-hunter, Transgene-Construct hunter, an open tool that identifies transgene insertion sites and provides simple reports and visualization aids. It relies on common tools used in the analysis of high-throughput data and makes use of chimeric reads and discordant read pairs to identify and support the transgenic insertion site.