We published this post previously, on June 11, but it was placed as a comment to another posting. We hope it will appear as a separate thread this time. We have a few questions, listed below:
1. Most of the patients in the hospital data set have records in the conditions data set. However, the conditions data set covers many more patients. Assuming that the DUPERSID is the unique identifier for an individual patient, and noting that there are multiple records per DUPERSID, the conditions data set covers 43,151 patients, while the hospital data set covers 11,846 patients. Of these 11,846 patients, 11,724 are represented in the conditions data set.
Both the conditions and the hospital data sets contain information on patients with pneumonia. There are 971 patients in the conditions data set with at least one ICD9CODX value between 480 and 486. In the hospital data set, there are 278 unique DUPERSIDs with at least one of Icd1x, Icd2x, Icd3x, and Icd4x between 480 and 486. (These 278 patients all appear to be in the conditions data base as well, with a code of pneumonia for at least one record.)
Our first question relates to the relevance of the conditions data base. Since there does not appear to be any way to determine if pneumonia for these patients was nosocomial, the conditions data set seems not to be relevant to classification into a "likely to contract nosocomial pneumonia" group. Are we correct in this conclusion?
2. Related to the first question, what exactly does the variable ICD9CODX in the conditions data set represent? How does it relate to the four diagnosis codes in the hospital data?
3. Back to the hospital data set. What exactly does SPECCOND represent? How does it help identify the nosocomial pneumonia cases? The definition is vague, and does not correlate directly with RSINNHOS. For example, there are 238 records where SPECCOND is 2, so the hospital stay is not related to the condition. One would suspect that nosocomial pneumonia cases would be included in this number. However, of the 257 records that show pneumonia codes in one of the four ICDx variables and where SPECCOND is not missing, all have SPECCOND = 1.
So, SPECCOND does not help in determining nosocomial. Now RSINHOS may help, as indicated in the blog. However, if one looks at those patients with a pneumonia code and with a 1, 3, 4, or 5 RSINHOS code, one finds exactly 19 such patients. As for ANYOPER, there are 15 patients with operations who had pneumonia as well. These 15 patients have an overlap with the 19 patients, giving us a total of 24 patients whom we can be reasonably sure contracted nosocomial pneumonia. (This does not seem like a big improvement over the situation with 20 MRSA patients!) Is our logic in reaching this conclusion flawed?
4. In the medication data set Rx0304, it is not clear what the nine variables on the Type of Pharmacy Provider (PHARTP1 through PHARTP9) represent. How do these nine variables relate to a single prescription?