I understand the patient data spans across two years (2003 and 2004). Can you throw more light on data that would answer the following questions?
- This is a repetitive question on what exactly the variable ICD9CODX in the conditions data mean. The data dictionary defines this as “Patient Condition”. Does this mean “Patient Condition for which the individual visited the hospital?” If so, doesn’t this confirm that the Pneumonia patients in this file are not nosocomial?
- While I understand the variables ICD9CODX in the conditions file and the diagnosis variables ICD1X to ICD4X in the hospital file gives the condition of the patient in the course of the year, what is the relationship between the values of these variables in the two files? For example, there are cases where the conditions in the Condition file do not exactly match with the conditions in the hospital file (DUPERSID = “20048015”)
- I was of the understanding that conditions file will be used for identifying the Pneumonia patients which then can be used to classify if it was nosocomial or not. But, out of 971 unique patients identified with Pneumonia (ICD9CODX between 479 and 487), there are 693 patients who are not identified with Pneumonia using the primary or secondary diagnosis code in the hospital file. Is the hospital file complete?
- For conditions and demographic files, it is clear on what the duplicate records by the unique identifier (DUPERSID) mean. However, the same is not quite clear for the Hospital and Medication files. For example, The Dupersid “20048015” has four records in the conditions file one record each for different values of ICD9CODX. The values of ICD9CODX in the conditions file are 401, 195, 787 and 300. However, for the same DUPERSID, four records are found in the Hospital file. Of these, two records have ICD1X takes the value 195 and two records have 560 for ICD1X.
(i) Why are there duplicates for ICD1X when there is only one record for a condition in the conditions file? What does
the duplicate record for the same condition mean in the hospital file?
(ii) The condition 560 found in the hospital file is not found in the conditions file. Why could this happen?
(iii) The conditions 401, 787 and 300 present in the conditions file are missing in the hospital file. How do we explain
(iv) There are 41 records for DUPERSID (20048015) in medication file. How do we interpret the duplicate records in
(v) In general, how do we interpret the duplicates for an individual in the hospital and the medications files?
- There are about 83% of records for which many key variables like SPECCOND, RSNINHOS and most of the variables in hospital file are missing. How do we interpret the missing values?
- There are 10% of records in the Demographic file for which EDUCYEAR takes the value -1. -1 is not specified in the data dictionary. Does -1 indicate “Inapplicable”?
- What is the relation between Poverty and Income variable? 35% of Individual with 0 income are classified with Poverty = 4 and 5. So, I am just curious to understand if poverty is based on a broader classification.