Technical investigation of the e-mail crash presented to the university board
On 18 September 2020, a number of e-mail servers crashed, which led to about half of the university's employees losing access to their e-mail and calendar for a large part of the autumn. The university board commissioned the internal audit to investigate the circumstances surrounding the crash and on Thursday 21 January, a report was presented to the Board.
The internal audit hired external specialist expertise, the company Transcendent Group, to investigate what caused the accident, why it had such serious consequences and how similar events can be prevented in the future. The investigation has been carried out with the help of interviews and analyzes of documentation.
The report states that the hard drives used by the university to store e-mail had a manufacturing error in the form of a software error. The error meant that the disks stopped working after 40,000 hours of use. The manufacturer stated this in February 2020, but the IT unit and the subcontractor remained unaware of the shortcomings, a contributing factor being stated to be ambiguities in the current service agreement. Therefore, the update of the software that could have fixed the manufacturing error was not installed.
The external investigation believes that this led to a large number of hard drives ceasing to function on 11 August, in one of the university's two server halls, located in Vasaparken. However, this did not affect e-mail use because the other server hall, located on Medicinareberget, took over the operation. The system is based on a security solution where storage is distributed over two physical locations and can continue to function even if one storage location is eliminated, so-called redundancy.
On 21 August, the damaged hard drives in the server hall in Vasaparken were replaced. After the accident, according to the investigation, there was knowledge about the manufacturing error and access to software updates to correct the error. However, the software update was not successful.
A transfer of e-mail data from the e-mail servers on Medicinareberget to the servers in Vasaparken was then started but could not be finished due to time constraints. On September 18, the hard drives in the server hall at Medicinareberget also stopped working due to the same manufacturing error. The email crash was a fact.
The internal audit believes that the IT unit needs to create a functioning order to minimize the risk of similar events happening in the future. This applies in particular to the need for backup within the IT environment, the need for a strengthened management function for licenses and service agreements and the need to clarify how information security incidents are to be handled.
The university board decided to give the Vice-chancellor the task of investigating the issue of responsibility.
BY: Thomas Melin