As the use of AI becomes more widespread in society, the need for reliable data increases, for example in the development of autonomous vehicles or environmental research.
AI is only as good as the data behind it, that’s the premise driving PhD-student Yi Peng's work. She started her PhD in software engineering with no prior experience of the field. Now she is researching how to make sure the data in new AI-based software systems is reliable.
“Many AI researchers focus on improving the model performance, but if the datasets are biased or skewed, the model won’t be useful in practice”, says Yi Peng, PhD student in requirements engineering.
Her research focuses on data requirements for machine learning systems, for example defining what kind of data a system needs, how it should be collected and documented, and when you need to go back and collect more data. As the use of AI becomes more widespread in society, the need for reliable data increases – whether related to the development of autonomous vehicles or environmental scientists using machine learning to classify images or make predictions.
“Environmental scientists often have huge datasets but little guidance on where their data is suitable or unbiased. Requirements can help them understand and improve what they are working with,” says Yi Peng.
Fast-changing research field
Yi Peng’s field requirements engineering is a long-established area within software engineering, focusing on helping development teams agree on the requirements for a software system before they start building it. The requirements could entail defining what the system should do, quality and safety requirements, when to update the system etc. Yi Peng experienced this challenges firsthand while working for a telecommunications company in China after her master’s degree in electrical engineering.
“I spent a lot of time in meetings trying to make sure everyone understood what we were building. That’s when I realised how much time and energy unclear requirements cost.”
A challenge for researchers within AI and machine learning is the speed in which the field is changing, where a paper can risk being outdated before it is presented at a conference. Within requirements engineering traditional methods were designed for systems with predictable behaviour, while machine learning systems require new approaches.
“Data changes over time. If the new data looks very different from what the model was trained on, the system might start making mistakes. My work looks at how to set up requirements so that developers know when the data needs to be checked, replaced or retrained.”
Photo: Natalija Sako
New in the field of software engineering
Yi Peng stems from a family of academics and a return to research was always in the back of her mind. However a PhD in software engingeering was not the obvious choice. Originally trained in electrical engineering and robotics in the US, she discovered her interest in machine learning while working on reinforcement learning: training robots to learn from trial and error. She knew that pursuing a PhD in the United States would be difficult due to the American restrictions on Chinese companies. When she found an opening in Sweden that focused on requirements for machine learning systems, she applied.
“I had visited Sweden as a child when my father worked as a visiting researcher in Stockholm. So, when I got the offer, it felt a bit like coming full circle.”
Since Yi Peng had no previous experience in software engineering, her PhD began with a broad review of existing research. This led to new ideas, such as using documentation standards from data science to strengthen requirement processes, which she is now exploring in collaboration with companies in the region. She is also learning to enjoy parts of academia she once found challenging.
“Networking doesn’t come naturally to me, but meeting people at conferences and workshops always gives me new perspectives”.
For Yi Peng, the motivation behind her work and research is ultimately being part of a bigger picture and moving knowledge forward.
“A PhD gives you a chance to work on a problem that hasn’t been solved yet, and that’s very exciting.”
Text: Natalija Sako
Yi Peng
Is: A doctoral student in requirements engineering at the Department of Computer Science and Engineering.
Age: 29.
From: Dalian, China.
Interests: Travelling, reading and Lego.
Fun fact: "I have started to collect magnets from each city I have visited since the start of my PhD, the collection is growing surprisingly quickly. I now have 13".