A lot of real data isn’t very tidy, mostly because most scientists aren’t taught about how to structure their data in a way that is easy to analyze.
Download a messy version of some of the Portal Project data. Note that there are multiple tabs in this spreadsheet.
Think about what could be improved about this data. In a text file (to be turned in as part of the assignment):
Describe five things about this data that are not tidy and how you could fix each of those issues.
Could this data easily be imported into a programming language or a database in its current form?
Do you think it’s a good idea to enter the data like this and clean it up later, or to have a good data structure for analysis by the time data is being entered? Why?