Good data management practice in academia - What are your options?
Big business takes data management seriously, funnelling millions of pounds into its efficient curation and mining. Geoffrey Moore (author of ‘Crossing the Chasm’ & ‘Inside the Tornado’) has tweeted that ‘without big data analytics, companies are blind and deaf, wandering out onto the Web like deer on a freeway.’ But how does good data management apply to academia and science in general? Scientists work with data daily, but is it managed in a way that maximizes its potential? That depends on what you mean by ‘good’ data management.
A good data management system has several advantages (Figure 1). Most labs don’t realize they are lacking an effective process until it’s too late. Here’s a simple question: If your senior postdoc left today, how much would you lose? You may have their results and protocols, but could you piece it all together? What about the experiments that didn’t work? And where are the reagents, compounds, plasmids and antibodies they used? businessdictionary.com defines good data management as the ‘administrative process by which the required data is acquired, validated, stored, protected, and processed, and by which its accessibility, reliability, and timeliness is ensured to satisfy the needs of the data users.’ Academic data management practices rarely fit within this definition. More commonly, academic data management is a combination of hand-written lab books combined with files stored on a server. What’s more the format of data entry varies from individual to individual. It’s a system that is likely to fail.
Assuming you want to set up a data management system in your lab what do you use? There are many options available (Table 1). The humble Excel sheet is a simple way to manage data, and most scientists are familiar with its features. However it lacks the robustness of more specialized software and access is limited to certain computers. A Google Docs based system has some clear advantages in its similarity to Excel and it being cloud-based (allowing a far greater degree of access). However like Excel, its simple and flat nature mean that comparing multiple projects is difficult and prone to human error. There is also no formalised standardizsation of the data.
Commercial software allows for searching across many projects whilst enforcing data standards. This in turn makes it extremely accessible to multiple users. It is also likely to provide additional tools to analyse the data. The obvious drawback is the ongoing subscription cost, which can be prohibitive for many labs. The final option is using free data management software. A good freely available package will provide most of the functionality of commercial software, but with less of the polish. For most academic labs, this is usually enough.
|Excel||Google Docs||Commerical Software||Free Software|
|Easy to implement|
|Requires time to learn|
|Data easily accessible|
|Dedicated Tech Support|
Table 1: Pros and cons of different data management systems.
In reality a combination of data management tools is likely the most effective way to manage data. For example, Excel is a great tool for data input. Ensuring you input data into Excel in a machine readable way is an effective way to begin implementing good data management. This allows other data management systems (such as the ChemBio Hub platform) to read and interpret the information and categorise it accordingly. You can learn more about good Excel data management practices at Data Carpentry.
Whatever approach you take to manage your data, the most important thing is it works for you and your lab. There are clear advantages to implementing a system that will stand the test of time. Those who don’t risk major losses any time someone moves on from the lab, taking their knowledge about what exactly is in that freezer with them.Back to top