Since the first Covid-19 cases began to appear in Jena at the beginning of March, Dr Michael Böhme has been recording data on cases of infection published by the city and presenting them graphically. What started as a result of his private scientific interest has now become an important source of information and the site has about 2,500 hits a day. In an interview, Böhme, who works at the Institute for Inorganic and Analytical Chemistry, explains why a detailed collection of data is important and what can be mathematically derived from such graphics.
You prepare coronavirus statistics for the city. How did this come about?
Like many others, I became interested in this topic due to the reports from China. But I have become personally aware of the topic at least since the first confirmed cases in Thuringia at the beginning of March. Fortunately, the city of Jena started publishing the confirmed numbers of cases on its homepage at an early stage. However, I did not consider the mere reports of the daily increases in the number of confirmed Covid-19 infections to be very meaningful. To assess the situation properly, it was important for me personally to consider these case numbers over time and thus be able to describe their development. On 17 March, my own scientific curiosity drove me to begin writing a first Python script, which automatically retrieves the numbers of cases from the city of Jena’s homepage and displays them graphically on my homepage.
Finally, I shared this project on Twitter and it has grown steadily over time. At some point, the city of Jena’s social media team noticed me and wanted to embed my diagrams on their homepage. Meanwhile, the Free State of Thuringia also started publishing extensive daily statistics in the form of a large table, and increasing numbers of municipalities in Thuringia are now putting the latest numbers of cases on their websites. That was the incentive for me to document these data as well and to present them in graphics.
Why is the graphical analysis of such data important and why should these data be publicly accessible?
The measures taken and the actions of all of us have a direct influence on the speed at which the virus spreads, which, in my opinion, can be recorded quickly using graphics. I therefore consider the public accessibility of data relating to Covid-19 to be very important within the concept of open data. At this point I would like to mention Iceland’s official Covid-19 statistics page as a very commendable example of open data. It not only graphically displays the number of cases over time, but also shows a lot of additional anonymised data, such as the number of tests performed, the origin of the infection and the regional spread. Additionally, all the raw data can be downloaded directly from the site. I consider such a high degree of transparency to be very beneficial in terms of keeping the public informed and it will certainly ensure greater acceptance of the measures adopted. Unfortunately, I have noticed that in Thuringia, the availability and quality of the figures on cases, and how up-to-date they are, are still far behind Iceland’s example and there are also strong regional differences. I would like to see our official sources achieve a similar level of transparency and willingness to inform. The daily statistics of the Thuringian state government, which provides data on all the regions of Thuringia, can certainly be expanded.
How is your current work on Corona statistics being conducted?
I have now fully automated the recording of data and their presentation, which is what makes it possible to produce the large number of graphics and the resulting information content. I obtain the data directly from the websites of the municipalities, the Free State of Thuringia and the Robert Koch Institute. I generally try to work with the website and the data to the best of my knowledge and belief, and I have therefore made the list of sources transparent on my GitHub page. Almost every day, minor adjustments have to be made to the relevant programs to ensure that automation is maintained. This was particularly time-consuming at the start the project, because the information content and the way the data were presented on the websites changed almost daily. Some municipalities, for example, summarise their case numbers in tabular form and others as text. It is especially difficult to collect the data automatically by means of programs if it is in the form of text, so I always have to invest some time in checking the data. The programs that collect the data also have to be flexible, for example in case the website of a municipality is not accessible. Unfortunately, I have to say that a few municipalities in Thuringia do not provide information from their own local health department.
I cannot say exactly how much time I have put into this project so far. However, I now receive daily e-mails with thanks, criticism and new ideas, which motivate me to continue this project. I also get support in other ways, because a good friend has helped me to develop an interactive map of Thuringia for my site, which visualises the Thuringian government’s statistics that are updated daily.
What are the possibilities for representing such growth rates graphically and what is each individual selection based on?
In the simplest case, the numbers of Covid-19 infections will be presented in a chronological sequence. However, for a lot of people, including myself, exponential growth is not always easy to comprehend. Therefore, it is also possible to represent the numbers of cases logarithmically, which produces a straight line for exponential growth. The doubling time can in turn be derived from the slope of these straight lines. On my page, I also use auxiliary lines in such a diagram to achieve a better optical estimation. Furthermore, with regard to Thuringia, I notice large fluctuations in the number of new infections each day. For this reason, I have started presenting not only the daily new infections, but also the sum of cases per calendar week as a bar diagram. My goal is always, through appropriate presentation, to make the figures easy to understand and quick to grasp, in line with the saying that “a picture is worth 1,000 words”.
How often do you update your Covid-19 graphics?
The programs that automatically retrieve the data from the municipal websites are currently running in an hourly cycle. In other words, it takes a maximum of one hour for the graphics to be updated as soon as the number of cases changes on one of the web pages recorded. The individual statistics of the state and the municipalities are updated at different times, so that depending on the time of day, one or the other statistic provides the more up-to-date figures and there are also differences between the figures. The Robert Koch Institute, for example, publishes its statistics – accurate as at 00:00 hours – at 8 o’clock every morning. The Thuringian state government follows in the early afternoon with statistics accurate as at 10:00 a.m. Municipalities such as the city of Jena compile their daily statistics in the evening.
If we look at the graph for current numbers of infections in Thuringia (Jena), what can be read from such a curve?
The simple numbers of cases are going to grow steadily over time; for this reason, I prefer to look at the “active cases”. This value is the result of the total number of people infected minus those who have recovered and those who have died. For Thuringia as a whole, we can currently see that since 14 April 2020, the number of “active cases” has stagnated at around 500 for a week. Looking at developments at regional level, however, we can see fundamental differences. Most of the new infections each day currently come from the districts of Greiz and Gotha, whereas in most of Thuringia’s other regions, this number is growing very slowly, and in some cases with doubling times of over three weeks.
The current (21.04.2020) figure for Jena shows that the number of confirmed cases has remained at 155 for 12 days. At the same time, the number of recoveries is growing quickly, causing the number of active Covid-19 cases, which peaked on 1 April, to decrease steadily. Without resorting to even more complex calculations, however, the development of active cases over time already shows that we have achieved a reproduction rate R of well below 1.0 in Jena. However, it is always important to note that these statements are only valid when considering the confirmed cases, and that the number of unreported cases is completely neglected. In addition to the city of Jena’s figures, I supplement the data with the age data provided by the Robert Koch Institute. These show that in Jena, the 15 to 34 age group currently represents the largest number of infections. This is a fact that probably correlates directly with the lower average age in the city. As a consequence, in relation to the total number of Covid-19 cases, the people in the city of Jena who are infected become less seriously ill and there are fewer deaths compared with other municipalities in Thuringia.
What could be extrapolated from such curves?
I use exponential functions on my site in order to generate a snapshot based on current numbers and to determine parameters such as the doubling time. As far as extrapolations for the future are concerned, I have decided not to make forecasts based on the data, even though I have received an increasing number of inquiries by e-mail on this subject. I would like to – and have to – leave that to the experts. In my opinion, it would be reckless because other people might rely on such forecasts. In addition, I find it very difficult to make forecasts at present, as regulations and measures change almost weekly, such as the introduction of compulsory face masks or the gradual reopening of schools. I therefore think that far more complex mathematical models are needed, which go well beyond my expertise as a chemist.
As a data expert, what is your opinion on information retrieval through mobile phone tracking?
I must admit that I have not yet formed a final opinion on the topic of a “Corona App”. I am aware that a number of such apps are already being tested and that if one wants to use this kind of information gathering, society will ultimately have to agree on an app on a voluntary basis. In this context, I consider the list of 10 assessment criteria for such apps published by the Chaos Computer Club to be relevant and ground-breaking. It is only possible to obtain usable data with such measures if they are widely accepted by the population. It is therefore essential for data protection concerns to be taken into account as well and for the source code to be publicly available. But what I believe should not be allowed under any circumstances is a ruling obliging the public to install such a tracking app. I believe that a general obligation would be too invasive as far as individual liberties are concerned, and could do immense damage to the public’s trust in our constitutional state. And to be quite pragmatic: with a compulsory app, one could still not stop people who do not want to be tracked through such an app from simply switching off their phone or leaving it in the car when shopping.
For how long will you continue with the corona statistics?
As experts keep pointing out, we are still at the early stages of this pandemic. Therefore, I consider it essential to document the data continuously, even though new infections are currently on the decline and public interest may be diverted to other issues in the course of the year. Another important point is the plan for a gradual reduction in the measures, which must be re-evaluated on a regular basis. By providing the graphics, I aim to help educate the population more broadly about the current coronavirus situation on the ground. In this respect, I will continue to collect and present the data for the time being.
What else do you focus on in your scientific work?
I work as a theoretical chemist in Prof. Winfried Plass’s working group at Friedrich Schiller University and my scientific work focuses on computer-aided calculations of magnetic molecules, mainly what are called single-molecule magnets. Such molecules are small permanent magnets that can be individually magnetised. They could potentially be suitable as magnetic data storage media and could significantly increase the storage density of data carriers. Through my scientific calculations on such systems, I verify and complement the experimental results of my colleagues, allowing us to identify trends and improve the properties of these molecules. In one context or another, I also work with exponential functions, for example when describing temperature-dependent magnetic relaxation behaviour or when estimating the magnetic domain sizes in single-chain magnets.