Human Genome Sciences (HGS), one of Maryland’s Biotech anchors, was recently purchased by Glaxo SmithKline (GSK). HGS was purchased because of its drug pipeline but, not so long ago, it was renowned for sequencing human genes and amassing a treasure trove of Intellectual Property. I had the great honor of being part of the HGS informatics team during that time.
In the early days, we were confronted by a torrent of gene sequences poring forth from a room full of ABI sequencers. It was important to capture and catalog each gene sequence; making sure it was associated with the correct tissue, disease, and developmental stage. But the sequencing instrument’s control software wasn’t designed to enforce our indexing rules. So we had to provide that control.
This problem is not unique. We continually encounter it when building scientific pipelines. This is because instrument builders don’t understand they are part of an informatics supply chain. They envision white coat-clad scientists meticulously typing instructions into their instrument’s control software. These imagined scientists wait expectantly for the output and process it manually when it arrives. But if the instrument is part of an automated pipeline, it often lacks the necessary control features.
One of the first things we do when automating scientific processes is to establish the informatics supply chain. As mentioned above, this often entails augmenting the participating instruments’ control software; either with technical or procedural controls. Once established, this supply chain acts as a “conveyer belt”; ensuring high quality meta-data is delivered along with the scientific raw data and the interpreted results. Furthermore, the supply chain must be navigable. Scientists must be able to trace back through the process to understand the precise conditions, materials, and equipment used to generate the data.
It is surprising how many labs neglect this simple, but necessary step. It is especially common when science is being performed at bench-top scale. Many bench-top labs are neglected by the IT department. They are left to fend for themselves with Excel, PowerPoint, Email, and corporate shares (and of course paper lab notebooks). Although good practices can be established by individual scientists, they tend to be local and highly manual. They tend to generate data that is not well suited for automated interpretation (parsing).
If you are building software solutions in the life sciences industry, or are a scientist in a bench-top lab who wants to prepare for automation, you should consider a couple of practices that can establish your informatics supply chain.
First, design a consistent coding scheme that can be used during sample and reagent accessioning. The coding scheme may contain embedded information or it may just be an anonymous number. Just make it consistent and unique.
Second, be disciplined about using this coding scheme when loading the instrument. The identifiers you load into the instrument’s control software will be embedded into the output files. So using the agreed upon coding scheme will greatly enhance traceability. If possible, avoid manually typing the identifiers because it’s inevitable that people will transpose and substitute digits. Instead, try to programmatically insert a “sample manifest” into the instrument’s control software. This can be accomplished by most commercial LIMS systems. But a point solution can perform this function as well.
By taking these steps, you can increase the fidelity of your scientific data and provide context for interpretation. And when the “Mother of All Informatics Systems” arrives, your data will be able to snap-in to the framework with ease.
When I examine laboratory processes, I always look for a way to tie information together in a seamless, navigable way. If, at some point, we get to work together, I’m probably going to ask you … “Where is your lab’s informatics supply chain?”
Image courtesy of Victor Habbic and freedigitalphotos.net