The covid-19 pandemic has been one of the most challenging situations ever to be faced by any government across the world. The last epidemic that did break out was the Spanish flu of 1918, an odd 100 years preceding this, with people hardly left to recall such times. Hence, we already lack living banks of knowledge regarding how to effectively deal with a pandemic. Our imagination of what the world could be like, what protocols to follow, what the ‘new normal’ is going to be, how many people we are going to lose are some uncomfortable questions we do not have answers to. It is at this juncture of uncertainties that CASPR had prudently decided to host Dr. Neeraj Hatekar (Economics professor of the University of Mumbai) who has developed an efficient econometric model to predict covid-19 infections across various states in the country (P.S. this webinar was held on 10th May and hence the data and write-up follows that timeline).
Right at the beginning of the presentation, Dr. Hatekar outlines why he has limited faith in the sur models that have been making predictions regarding outbreaks both in the past and present. He cites two flawed predictions – sur models had predicted that Maharashtra would have 2cr covid infections by May end which seems far from happening. Similarly, these models had predicted 65000 deaths during the mad-cow disease in USA when the actual fatalities hovered around 470. The reason for such anomalous predictions Dr. Hatekar identifies is that these models try to predict the entire game even before the match has started, thereby leaving multiple scopes for it to go wrong. It essentially doesn’t factor variables like change in people’s behaviour or the government’s policies – with the migrant workers returning to their native villages, the whole corona equation is set to change and naturally all sur predictions will fail. Hatekar therefore adopts the Standard Time Series Model of econometrics alongwith the bootstrapping method. By studying countries which have completed the disease transition, bootstrapping attempts to forecast cases and develop methods to track the same. Hatekar unequivocally declares that the data used is reported positive confirmed cases (across states) and faces limitations of not counting delayed lab reports, random samples etc. The model is however said to be more accurate than sur models in predicting infections.
Hatekar then outlines two methods for studying current infections – the ‘Logistic Growth Curve’(LGC) and ‘Carrying Capacity’(CC). While the LGC refers to the curve showing the initial slow growth followed by a faster pace of growth to the maximum point (peak) of corona infections, the CC points at the maximum no. of infections a state’s healthcare system can possibly support. He provides a formula for calculating the LGC: - Nt = K 1+(K− K0 N0 )e−rt where Nt is the cumulative size of population at time period ‘t’, K indicates the final CC, N0 the initial size and r, the rate of growth in infection continuing unchecked. We would essentially be calculating ‘K’ and ‘r’ over here as part of the model. This entire model is being based after the declaration of the 2nd nationwide lockdown since atleast 100 cases had to be reported per state to have sufficient data to work on. He then goes on to show LGCs of various states from the data collected. While states like Gujarat, Odisha, Punjab, Delhi, Tamil Nadu seems to have increasing corona cases others like Kerala, Bihar, Himachal Pradesh, Uttarakhand etc having an S- shaped curve indicates a decrease in cases. Hatekar also speaks of another method of tracking infections – the ‘doubling period’. Deviating from the traditional mode of defining doubling period as the number of days required from reaching x to 2x cases, he says that the real doubling period is actually unobservable and the doubling period we witness might be an estimated realization.
Thus, ‘the worm’ (graph) that is portrayed is based on the value of ‘r’ (△ XT/△ XT − 1) which is simply measuring the number of cases recorded today compared to the number yesterday. But since this is based on specific time periods, we need to have 1000 bootstraps collecting data everyday for a specific time period (say15 days). We then average all the 1000 ‘r’ values we receive (on bootstrapping the same number of times) and get a single coefficient value for ‘r’ each day, which is finally plotted onto a graph called the ‘worm’. Therefore, r<1 indicates a ‘non-exponential phase’ (lower no. of infections with doubling period increasing simultaneously), r>1 indicates an ‘exponential phase’ (higher no. of cases with doubling period decreasing simultaneously) and r=1 indicates a ‘switching phase’ when it is either shifting from non-exponential to exponential or vice versa. Also, to avoid confusion regarding the number of cases per day, the width of ‘the worm’ is supposed to portray this. When Hatekar plots the worm for different states, we observe that except states like TN, Maharashtra, Delhi and Gujarat most other states studied are in the non- exponential phase with the doubling period going up. Thus, the reported infections are majorly going down in the whole country.
Dr. Hatekar then comes to the final part of the presentation wherein he displays his predictive model known as the ‘time series forecasts’ wherein he develops short-term ARIMA forecasts for each state every day. Unlike the sur models Hatekar’s model prefers a ball by ball prediction, with predicitions available only for the future 10-15 days maximum as the whole corona situation is a game of flux. To this end, he generates 1000 bootstrap time series samples every day and then by fitting them into 15-20 different ARIMA models finds out the one that suits the best for each replication. Thus, he effectively generates around 1000 forecasts from around 10000-12000 models every day for each state. While the x-axis carries the forecasted value, the y-axis carries the actual data and as per Dr. Hatekar’s model the correlation between the actual and forecasted value is 0.97 (highly accurate model). He then shows the next day’s predicted cases by triangles against today’s cases (circles) which shows that other than Tamil Nadu (decreasing reported cases), Delhi, Maharashtra and Gujarat will have huge spike in the number of reported cases. Thus, from this we can conclude that states like Maharashtra, Delhi, Gujarat and TN are the main drivers of covid in India which might be an advantage in certain ways. If the infection spread is geographically restricted, localized lokdowns can help in containing the spread. However, presently with migrant workers returning to their native villages it is a huge concern as to the spread of corona in spaces which have dismal healthcare systems resulting in even greater deaths (besides the migrant workers dying on roads as a result of the faulty policies of the government). Hatekar however concludes that India has not yet lost its grip on corona and expects that with due monitoring of certain cities, we will be able to gain control of the pandemic by May-end.
A lot of time has passed since the webinar and India still sees no peak in sight, all the while recording the highest no. of daily cases alongwith record daily deaths. The situation is indeed alarming and needs immediate monitoring. If the ‘Kerala model’ is to truly succeed, more has to be invested in health and the healthcare system has to be made a strong backbone against any such future epidemics. Moreover, even for slums and shantytowns, the quality of living has to be bettered because it is here to due to malnutrition (weak imunity) and unhygienic living conditions that diseases are more likely to spread. As Hatekar rightly observes that after a point of time, lockdowns don’t really matter and it all depends on how effective the governments are. While countries like New Zealand and Israel have effectively controlled the pandemic within their borders, unstable countries like Syria and Zimbabwe have seen null effect of lockdowns. India has just been able to slow down the rate of infections and still has a long way to go in overcoming the coronavirus.