
Project Tycho Level 2 Data

Please note: Version 2 of Project Tycho is now available. This listing contains Version 1.1.0 (Level 2) data and was last updated 2017-12-07

1. Dataset Content and Format

Project Tycho data include counts of infectious disease cases or deaths per time interval. A count is equivalent to a data point.

Project Tycho level 2 version 1.1.0 data include data counts that have been filtered from the raw data to render standardized data that can be used immediately for analysis. All level 2 data were originally reported in a consistent format and have not been transformed into a standard format by Project Tycho staff, except for smallpox records that included repeated counts for the same location and week, but sometimes with different numbers. These duplicate smallpox records have been averaged into one count for each location and week. Level 2 data include counts for a wide variety of diseases and locations for varying time periods. Because we removed data in an inconsistent format from level 2 data, counts may be missing for certain diseases, locations, or years. For the most complete collection of standardized data, we encourage users to use Project Tycho version 2.0 datasets.

More detailed methods and additional information about the origin of Project Tycho level 2 version 1.1.0 data can be found in our original publication in the New England Journal of Medicine: http://www.nejm.org/doi/full/10.1056/NEJMms1215400

Level 2 version 1.1.0 data is represented in a CSV file with 11 columns: - epi_week: a six digit number that represents the year and epidemiological week for which disease cases or deaths were reported (yyyyww) - country: a two digit country abbreviation, only including ìUSî in version 1.1.0 - state: the two digit postal code state abbreviation that represents the state for which a count has been reported - loc: the name of a state or city for which a count has been reported, capitalized - loc_type: the type of location (STATE or CITY) for which a count has been reported - disease: the disease for which a count has been reported, in all capitals - event: an indicator representing the disease outcome reported, including ìCASESî or ìDEATHSî - number: the reported number of cases or deaths - from_date: the start date of the time interval for which a count was reported, as yyyy-mm-dd - to_date: the end date of the time interval for which a count was reported, as yyyy-mm-dd - url: the URL of the source document from which the count was obtained

2. Citation

Willem G. van Panhuis, John Grefenstette, Su Yon Jung, Nian Shong Chok, Anne Cross, Heather Eng, Bruce Y Lee, Vladimir Zadorozhny, Shawn Brown, Derek Cummings, Donald S. Burke. Contagious Diseases in the United States from 1888 to the present. NEJM 2013; 369(22): 2152-2158.

3. Contact Information

In case of questions or ideas, please contact Project Tycho via email (tycho@phdl.pitt.edu) or via the website (www.tycho.pitt.edu).