Monday, July 21, 2008

Medication dataset questions

1. We observed a variable name ‘LINKIDX’ in the medication dataset. It is defined as ‘ID FOR LINKAGE TO COND/OTH EVENT FILES.’ We are not sure how it links the files. It appears to be made by DUID + PID + X. What is ‘X?’

2. It has variables ‘RXCCC1X’, ‘RXCCC2X’, and ‘RXCCC2X’. They are defined as ‘MODIFIED CLINICAL CLASS CODE.’ Is it a modification of ICD9 codes in Hospital/Conditions dataset? If so, how are they modified? How are they related to each other as well as with ICD9 codes in Conditions and Hospital datasets?

3. Similarly, is there a relation between RXICD1X, RXICD2X, and RXICD3X, and ICD9 codes in other files? If they are related, what is the relationship? Are they related to RXCCC1X, RXCCC2X, and RXCCC2X too?

4. Which variable(s) will be removed from the testing dataset?

3 comments:

Nick said...

1. Answer coming soon.

2. Yes, RXCCC1X, etc. contains an alternative coding for diagnosis. Best to ignore these. I believe we will be removing them entirely from the test set.

3. In the Medications file, the ICD codes correspond the reason the drug was prescribed. They may not be entered, and they may not be accurate, but they're what we have. Further, there are several reasons why they may not correspond to the ICDs from the hospital file - e.g., a hospital diagnosis may not require post-discharge medication. Medications administered in the hospital are not, I believe, recorded anywhere here.

4. We're doing a double-check on this to make sure we don't leave any 'leakers' that would give away any answers. Unfortunately this check may delay the release of the test data for a few days.

As of now, my best answer for what will be removed:

Hospital file: Any ICD code related to a diagnosis of pneumonia. The rest of the record will remain intact.

Medications file: The entire record for any patient with an ICD code indicating pneumonia.

Conditions file: Any record with a pneumonia diagnosis.

Demographics file: Nothing.

Sri said...

Quick Comments on your responses to questions (3) and (4)

(3) - The medication file contains prescriptions of several years in the past too (30% of the records before 2002). As you mention, there are considerable cases where we do not find a match for every condition in the hospital file.
Considering the prescriptions in the year 2003 and 2004, we do find considerable match between the ICD1X code in the Hospital file and RXICD1X in the medication file. Given this, doesn't it make sense to use RXICD1X to identify the medicines prescribed during the hospital stay for a specific condition?

(4) - I agree with you that there is nothing to be removed in the test set from the demographics file and also your point on the conditions file. But, I am unable to understand the following

1. Hospital File - I understand ICD2X to ICD4X will not be provided as they will be used to define the Nosocomial Pneumonia which is what the model predicts. However, I do not undersand why the primary diagnosis code identifying Pneumonia should be removed. Also, I think you did not mean ICD1X will be completed removed from the test data because

a. Information like what kind
of tretment/medication the
individual underwent after
Primary diagnosis is an
important predictor of risk
(the Primary diagnosis code
itself can not be a leaker
the answer)

b. If my understanding of (3)
stated above is correct,
primary diagnosis code is a
key variable for bringing
the information from the
medication file

The same argument holds for RCICD1X from the medication file.

2. Medication file - If I got you right, are you saying, any record with RXICD1X in 480 to 486 will be removed? If so, why would you do that as RXICD1X corresponds to primary diagonsis only (same as explained in (1)).

Nick said...

This is a crossover effect between the two phases of the contest. We can't leave primary diagnosis in the hospital dataset because, in Part 1, we're asking you to predict *any* diagnosis of pneumonia (including primary) for that patient visit. So, for our test set, any occurrence of a pneumonia diagnosis must therefore be removed. Other ICD codes are left intact.