Friday, April 18, 2008

Two more questions

1. We only found 20 MRSA cases(by searching V09) in the hospital file, and only 9 patients in the patient conditions file,are these whole MRSA cases or are there more cases in the data set?
2. For the part 2, whether the ultimate objective is to reduce the total cost for MRSA (prophylactic + treatment), or the costs for all the other infections.

Tuesday, April 15, 2008


1. MRSA is represented by ICD9 code V09.0. It can be identified either in the hospital file using one of the ICD9 fields, or in the patient conditions file, which lists all conditions for which a patient was diagnosed for that year.

2. There is a cost of preventive care that you can determine from the prescription database. It generally requires the antibiotic, Vancomycin or the antibiotic, Zyvox (linezolid). Then there is the added cost of MRSA, which is not uniform; rather, it depends upon the patient condition and the patient procedures performed. You will need to estimate this added cost from the data. Both emergency and elective patients have some risk of MRSA. The hospital will want to use the preventive treatment if it costs less than the added cost of MRSA.

3. The fields that were not specifically identified in the data dictionary are not needed for the analysis.

Thursday, April 10, 2008

Misc. Questions

1. Part 1. We'd like to make sure which variable in the data set can identify the patient diagnosed with MRSA.

2. Part 2 Shall we minimize a total cost of medication for all patients or only for those who were admitted for an elective surgery?

3 some unclear variables:
Hospital0304: VARPSU, VARSTR(for which PSU)

Monday, April 7, 2008

Data Issues In RX0304

This file is CSV with quoted fields if a comma is part of the field.

1. There are a few instances where a quote is not part of a matching pair which throws the values off by one column. Typically the pattern is a GX qualifier followed by a dimension that has a quote in it, but it is the same quote that is used to delimit the field.

Here are the line numbers that I have checked that have this issue.

40490, 347931, 347932, 347933, 347934, 347935, 371124, 371125, 371126, 371144, 371152, 417874, 417875, 417876, 469465, 469466, 566255, 621479...there are more later.

2. There are two instances where the PHARTP1 has invaild value of "%"

139722, 139723

Be aware since this can cause problems with import routines.


Tuesday, April 1, 2008


Thanks for checking out the blog for the 2008 INFORMS Data Mining Contest. Please post questions here so they can be answered publicly, and check back for updates. Good luck!