Along with the health crisis/apocalypse of COVID-19, there is a data-reporting disaster happening.
Ten provinces and three territories all seem to have different data reports on their public health websites. Some report aggregated data, some report easily found numbers of cases per health units in the province or territory, some are changing the data formats as they go along and abandoning old formats and reports. None seem to be reporting a snapshot of the data from the beginning of testing to today. There are reports for the current day but not yesterday. Without the progression of number of tests, number positives, number deaths, for every day, it is difficult to determine whether any of our sacrifices are paying off. For example, Ontario for most of March reported case numbers 31 to 858 with Public Health Unit info, then stopped.
We cannot confirm the peak or the plateau or the down side of the peak, without accurate, timely data. Since I started writing these pages in March, the data reporting has improved, but testing is still deficient, and there does not seem to be a consistent pan-Canada approach. Chaos ensues.
Alberta is probably the best for reporting data, with geographic breakdown that should be the standard.
Canada has aggregated the data and provided a downloadable .csv (comma separated value) file, easily used by data wonks everywhere. And then changed the format of that, and do not include all the data advertised.
For the report ending march 28 (who does dates as dd-mm-yyyy anyhow?) there were 10 data fields advertised in row 1, but only 9 data fields are filled in, with numtested empty.
pruid,prname,prnameFR,date,numconf,numprob,numdeaths,numtotal,numtoday,numtested 35,Ontario,Ontario,31-01-2020,3,0,0,3,3, 59,British Columbia,Colombie-Britannique,31-01-2020,1,0,0,1,1, 1,Canada,Canada,31-01-2020,4,0,0,4,4, ... 1,Canada,Canada,28-03-2020,5386,39,59,5425,736,
Then, they added another field, percenttoday, and numtested is still empty. But, percent of what today? The lack of data for numtested is a huge miss. One of the daily data points one would really want (really, really want) is the number tested, so we can form an opinion on the per capita testing. Admittedly, old negatives from February may not matter. Those people may all have been infected in the meantime. But, statistically, the more tests, the better that data models can be used as predictive tools.
pruid,prname,prnameFR,date,numconf,numprob,numdeaths,numtotal,numtoday,percentoday,numtested 35,Ontario,Ontario,31-01-2020,3,0,0,3,3,3.000, 59,British Columbia,Colombie-Britannique,31-01-2020,1,0,0,1,1,1.000, 1,Canada,Canada,31-01-2020,4,0,0,4,4,4.000, ... 1,Canada,Canada,29-03-2020,6255,3,61,6258,833,0.154,
UPDATE: April 7. Canada changed the format of the csv file again, and are still not filling in all the fields. I do not understand why they provide calculated items like percents and rates in the csv. They should provide raw data and let the wonks calculate percents, rates etc. By the end of this, I will either by an authoritarian, demanding government rules all statistics reports with an iron-fist, or a libertarian, asserting that government has no role in statistics or public health.
UPDATE: April 25. Canada started reporting number tested on April 03, but as an aggregate number. Looks impressive but if you break it down to the daily delta, the numbers are stupidly small, for a wealthy country.
Journalists in the United States seem to have set up their own attempt at consolidating data, but complain of the same basic issues as Canada: missing data, inconsistent reporting from each state using it own parameter and formats, and difficulties
getting past days' data, and format changes.
News media like the CBC, the Globe, the Guardian and others have reported on the difficulties of getting access to PUBLIC health data. The access rights to the data is in the name: PUBLIC health data. It is not just reporters and hobbyist data wonks that complain. CIFAR and medical researchers at UofT have complained in the media. Data scientists in Toronto, Vancouver, Montreal and Alberta are among the best in the world. Give them access to all the data, all the time. Predictive tools and better models will flow, quickly, from fuller access. We are crippling our efforts with bad data or poor access to data. One of the billions of dollars freed up by the federal and provincial governments should be earmarked to put all the provinces and territories on an even footing for reporting COVID-19 data. COVID-19 will be with us for a long time; we might as well help ourselves fight it.
On a given day, I am inundated with instantly-accessible misinformation about the virus and the disease, but isolated in a data desert. Fix that.
And what is going on today, March 30 with Quebec having 3430 cases, and Ontario having 1706? Ontario has 14 millions and Quebec has 8 millions. There is a disaster of something: more testing in Quebec, or under-testing in Ontario or under-reporting in Ontario or less adherence to distancing in Quebec or what the heck in going on?
This story from the Globe on March 31 a.m., explains Quebec's skyrocketing cases
and the Star shows Ontario's weakness in testing
and testing and access to data is addressed in an op-ed in the Globe by some smart guys at UofT and McGill.
Ipolitics amplifies the message with these thoughts from Valeria Percival.
The message to governments is test and test again, and share the data. Test for COVID-19 virus, and test for immunity. I can more easily find out what Cardi B is wearing, right now, than find out what the age/gender/geographic breakdowns is for cases for each province.
As COVID-19 devastates people, layoffs have also affected access to medicines for those who lose their workplace benefits. Maybe it is well past time for the universal pharmacare doo-hickey.
And again, more about the Data Attention Deficiency in Canada, in the Globe and read the article linked in the story.
And PHAC was ill-prepared, despite the SARS-thingy warning, and assurances of being in good shape.
actual photo of the virus