To the top

Page Manager: Webmaster
Last update: 9/11/2012 3:13 PM

Tell a friend about this page
Print version

An extensive dataset of U… - University of Gothenburg, Sweden Till startsida
To content Read more about how we use cookies on

An extensive dataset of UML models in GitHub

Conference paper
Authors Gregorio Robles
Truong Ho-Quang
Regina Hebig
Michel Chaudron
Miguel Angel Fernandez
Published in IEEE International Working Conference on Mining Software Repositories
ISSN 2160-1852
Publisher IEEE
Publication year 2017
Published at Department of Computer Science and Engineering (GU)
Institutionen för data- och informationsteknik, Software Engineering (GU)
Language en
Keywords dataset, GitHub, mining software repositories, modeling, UML
Subject categories Computer Systems


© 2017 IEEE. The Unified Modeling Language (UML) is widely taught in academia and has good acceptance in industry. However, there is not an ample dataset of UML diagrams publicly available. Our aim is to offer a dataset of UML files, together with meta-data of the software projects where the UML files belong to. Therefore, we have systematically mined over 12 million GitHub projects to find UML files in them. We present a semi-Automated approach to collect UML stored in images,.xmi, and.uml files. We offer a dataset with over 93,000 UML diagrams from over 24,000 projects in GitHub.

Page Manager: Webmaster|Last update: 9/11/2012

The University of Gothenburg uses cookies to provide you with the best possible user experience. By continuing on this website, you approve of our use of cookies.  What are cookies?