New computer speeds clinical data collection

Software engineer Jay Cowan with the new computer, called a data warehouse appliance, that is helping Vanderbilt scientists more quickly search, filter, analyze and annotate the de-identified medical records of approximately 2 million patients. (photo by Joe Howell)

Tucked in a data center in the basement of Vanderbilt University Hospital, a new computer the size of a large armoire, called a data warehouse appliance, is delivering a new order of speed to Vanderbilt clinical scientists as they search, filter, analyze and annotate the de-identified medical records of approximately 2 million patients.

That’s how many patients have been seen at Vanderbilt University Medical Center since the advent of routine electronic medical record keeping in the early 1990s.

Warehousing de-identified electronic medical records on a massive scale changes how clinical science is conducted. The data needed to resolve your hypothesis likely have already been collected and are mere keystrokes away.

And with the new data appliance at work, finding a set of records that interests you is far quicker, allowing researchers and the rest of the Vanderbilt community to study and conjure with this data as never before.


It’s apparently like trading a horse and buggy for a Saturn V rocket.

“The speed with which we can now assemble and work with large data sets represents a tremendous leap forward for the research enterprise,” Paul Harris, associate professor of biomedical informatics and biomedical engineering, said a few days after the data appliance (a Netezza 1000 from IBM) debuted in late February.

“It’s much more interactive, and things that took you an hour before now take less than a second,” said Brad Malin, associate professor of electrical engineering and computer science, and vice-chair for research in the Department of Biomedical Informatics.


Kevin Johnson, Cornelius Vanderbilt Professor and chair of Biomedical Informatics, helped oversee the acquisition.

“We expect this investment to change personalized medicine translational research at Vanderbilt. When you can access data to see if there is a patient population that satisfies the criteria for a research study, it changes how we think about potential studies and how we can fund these projects.

“Moreover, the work that has been done using the new data appliance to visualize information allows researchers to drag and drop medication names, diagnoses and keywords to learn what studies are possible. Hypotheses and prospective research cohorts can gel very quickly,” Johnson said.
Vanderbilt researchers call their clinical research database the SD, short for the Synthetic Derivative, and it is filled with lab results, medications, diagnosis and procedure codes, notes, summaries, histories, clinical messages, etc.

The records are stripped of personal identifiers and, without sacrificing their scientific utility, are randomly altered to help prevent re-identification. To use the full power of the SD, researchers obtain approval and sign a data use agreement.

Some clinical investigators run studies on the myriad observable characteristics and demographics captured in the SD, while others study correlations between the SD and genotypes from BioVU, Vanderbilt’s repository of DNA extracted from discarded blood collected during routine clinical testing. BioVU includes samples from 165,000 patients (and counting), all of whom are represented in the SD. In the course of research, some 20,000 of those samples have already been typed for common genetic variants.

“As time goes on, the number of diseases and traits you can analyze with genotype sets and electronic medical record populations is limited only by the diseases that send people to their doctors,” said Josh Denny, M.D., M.S., assistant professor of Biomedical Informatics and Medicine.

The data appliance is not only much faster, but it’s also enabling users to exploit data more fully. Limitations of the previous system of linked servers apparently forced database programmers to favor certain analytical strategies over others.

“This switch puts our medical record data at our fingertips for clinical and genomic studies that we couldn’t have done before,” Denny said.

The RC, short for the Record Counter, is available for use by anyone with a Vanderbilt ID and password without prior approval. Useful for planning and assessing the feasibility of a research study, the RC returns patient counts based on clinical and demographic criteria entered by users.

“Say I want to know how many patients there have been at Vanderbilt with diabetes, on drug x, with a hemoglobin A1c level above 10. I can now see that count instantaneously.

“You can see a patient with a rare infection in the clinic, find instantly how many similar cases there have been at Vanderbilt, and within a couple of days have approval for a clinical study using the SD,” Denny said.

The SD has had 250 users to date, the RC 540. Total users are expected to double within a year.

The new, faster Record Counter is easy to use and well worth accessing.

For example, out of 2 million medical records, how many indicate assault by a human bite? — 245.

How many have “cell phone,” “distracted” and “car accident” in their clinical notes? — 160.

How many contain “rock concert” and “hearing loss” in their clinical notes? — 80.

How many have “beer,” “lighter fluid” and “burns” in their clinical notes? — 15.

The masterminds behind the new SD/RC interfaces and the transition to the data appliance include software engineers Jay Cowan and Alex Saip and Analyst Programmer Susan Bradeen.

Paul Govern, (615) 343-9654