Tuesday, August 5, 2008

Part 1 submission instructions

After some false starts, I believe we can specify the exact form of Part 1 submissions. Our universe consists of patient visits - that is, one answer for each time a patient is admitted to the hospital (so, some patients will have multiple entries, which as you've seen, may have different answers). Secondly, in order to judge by AUC, we do need a ranked list, rather than just a series of yes/no answers.

So, to enter Part 1, email me (nick-street@uiowa.edu) a file containing:
  • 17,687 lines (one for each record in the test hospital file)
  • Each line contains one EVNTIDX (the unique identifier on a patient visit), and nothing else
  • The visits are ordered by probability of a pneumonia diagnosis (primary or secondary), in descending order
For example, if you decide that patient 58603044 is very likely to have pneumonia every time he/she is admitted, and patient 30015017 is very unlikely to have it, then your file might look like


One submission will be judged per team - if you send subsequent entries before the deadline, the new one will replace the previous ones.

Monday, August 4, 2008

Test files

The test files have been posted on the "Datasets and Documentation" page. As suggested earlier, the following cleaning has been done to (I hope) remove any leakers:

Hospital: All pneumonia ICD codes have been removed.
Medications: All records containing an ICD code for pneumonia have been removed. Also, the CCC codes (an alternative coding for diagnosis) have been removed from all records.
Conditions: All records containing an ICD code for pneumonia have been removed.
Demographics: No cleaning.