Welcome to the ChemBio Hub blog, a place for you to find out more about how we operate at a technical and collaborative level and how you can get involved. You can of course visit the ChemBio Hub website.

The future of ChemBio Hub

posted 15 Jun 2016 by Adam Hendry

As many of you know, funding and development of the ChemBio Hub will soon be ending on the 30th June. Please be reassured that the ChemBio Hub system and all your data will continue to be secured on the SGC servers for the foreseeable future. You can continue to use our services as normal. We do expect to be able to perform very limited fixes to issues. To report a minor issue or suggestion, please use the drop down in the top right of the screen (see figure below). For serious issues or emergencies please contact Dr Brian Marsden (brian.marsden@sgc.ox.ac.uk) who will endeavour to respond as quickly as possible.

Report an error

"Report an Error" is located in the ChemBio Hub menu

We are extremely proud of what we have achieved and truly feel that we have created something that will help scientists for many years to come. We now have 50 active projects in use from nine different departments across the University of Oxford.

Here are a some testimonials highlighting our success:

Professor Renier van der Hoorn - Department of Plant Sciences
Role - Associate Professor Plant Sciences

“The ChemBio Hub system allows us to quickly upload and manage numerous inventories. Having easy access to this data in the laboratory will make my group’s lives easier and more productive”.

Professor James McCullagh - Department of Chemistry
Role - Head of the Mass Spectrometry Research Facility

“The ChemBio Hub platform has proven to be an effective way to manage our compound inventory. Thanks to its high flexibility, we are also testing its ability to digitise our facilities sample submission process”.

Department of Physiology, Anatomy and Genetics
Role - Technician

“The ChemBio Hub platform allows us to catalogue and maintain our samples from any computer. The ability for us to manage user permissions ensures we can easily give access to researchers who want to use our resources, whilst ensuring the security of our data.”

“The ChemBio Hub platform has proven to be an effective way to manage our compound inventory. Thanks to its high flexibility, we are also testing its ability to digitise our facilities sample submission process”

These testimonials reflect our success in being truly flexible, enabling us to manage data from a broad variety of research fields. This is largely due to the valuable time and feedback provided by over 100 researchers. As such we would like to thank everyone one of you who took the trouble to speak to us and pass on your thoughts.

Looking to the future, our first publication outlining the ChemBio Hub’s development challenges and achievements will soon be submitted. We are also continuing our search for additional funding to continue developing the system and take it to the logical next level - a fully fledged academic electronic lab notebook. We have collected sufficient feedback to suggest this is seriously in demand and have now built the prototype tools to implement it. We hope that one day soon we can secure the funding to begin developing a production version of this much requested feature.

Creating and managing a project in ChemiReg

posted 22 Feb 2016 by Adam Hendry

A recent update to ChemiReg means that you can now create and manage projects yourself. Creating projects is quick and easy, taking no more than a couple of minutes to set up. Here’s a quick introduction to help get you started.

Creating a Project

On logging in, click the ‘Add Project’ button at the top of the screen to begin creating your new project.

Click Add Project to get started

Figure 1: Click Add Project to get started

A new window will open providing options to create and customise your new project. Choose your project name and a few custom fields to get going in seconds, or customise your project further to improve your data’s value:

New Project form explained

Figure 2: New Project form explained

There are a number of ways you can customise the data in your project:

  1. Copy and paste the custom fields from another project you own. This is helpful if you want to quickly create a project similar to one you already maintain.
  2. Choose your projects name. No special characters are allowed (e.g. - % ^ etc).
  3. Choose your project type.
    1. Select ‘Chemical’ if your data has structural information (contains 2D molecular structures or SMILES).
    2. Select ‘Inventory’ if your data contains no structural information.
  4. Add a custom field. Custom fields are the data you wish to be catalogued for each entry and can be anything you want (location, names, etc). There are a number of options available here:
    • a) The name of your field (text characters only).
    • b) Make this field compulsory – it must be filled in to allow data to be saved.
    • c) Choose a field type appropriate to the data you’re submitting.
    • d) Describe what this field is for.
    • e) Make this field only visible to editors (users with the ‘viewer’ permission on this project will not see this specific custom field data).
  5. Delete or reorder a field by clicking and dragging. Fields at the top of the list will appear first in the table.
  6. Add more fields
  7. Disregard / Save project*

*Note that you can’t remove or modify a field heading name after you’ve clicked save, but you can add additional ones and reorder them.

Once you’ve added all your desired fields, simply click save and your project is ready to use.

Adding users / changing user permissions on a project

So you have your new project and now you want to let other people use it. This is achieved by assigning a user a role. There are three types of user permissions, Owner, Editor and Viewer. Your assigned role setting on a project will determine what you can and can’t do:

What different user types can do

Figure 3: What different user types can do

To add new users to your project or edit current user roles, go ahead and click the ‘Edit User Roles’ button.

Click Edit User Roles to configure what users can do

Figure 4: Click Edit User Roles to configure what users can do

This opens a new popup window allowing you edit user permissions. Here you can add or remove users from the role you wish to grant or remove.

Note: Users are only visible after they log in for the first time.

The Edit User Roles form

Figure 5: The Edit User Roles form

Once you’re done, click save and the new permission settings will be assigned. Users added to the project will need to refresh their page for the changes to take effect.

An alternative way to quickly add multiple users is to ‘Copy rules from another project’.

You can clone permissions to other projects

Figure 6: You can clone permissions to other projects

This allows you to rapidly assign multiple users based on another project’s permission settings.

Now you and your collaborators can get started.

We’ve been working hard to make the best possible user experience for academics everywhere. If you have any comments or would like to see something improved then let us know by contacting info@chembiohub.ox.ac.uk.

Excel Pivot tables - a quick introduction

posted 08 Jan 2016 by Adam Hendry

The Excel pivot table is a powerful tool for getting the most out of large data sets. Tabulating your data is the best way to collate large amounts of information in a concise and meaningful way. However data in this format is difficult to interrogate, hiding the really interesting results you’re after. Using Excel’s pivot table function, you can rearrange your data according to any parameter you specify. This allows you to ask very complex questions with an instant answer.

How do they work?

Let’s say you have a table containing all your cell line assay data:

Example of some cell line assay data

Figure 1: Example of some cell line assay data

The data includes information about where the cell lines came from, what mutations they have and what assays they have been run in. Using the data above, we want to know which cell lines featuring the gene 9 mutation have been run in an IC50 assay, and we want only male cell line samples.

First we click Insert -> Pivot table (your table will be selected automatically):

Pivot table menu within Excel

Figure 2: Pivot table menu within Excel

Then we choose which parameters we wish to be displayed (based on the question we were asking):

Select pivot table fields in a semantic way

Figure 3: Select pivot table fields in a semantic way

Now we have a table showing only male cell line samples that feature the gene 9 mutation that have also been run in an IC50 assay. Of course you can modify the table to include any parameters you wish, allowing you to very quickly interrogate what was previously an intimidating data set.

Further reading – Advanced tips and tricks

Pivot tables are a criminally underused feature in Excel that can save a lot of time and effort. They may seem intimidating at first but once you start using them you’ll never look back.

If you’re interested in learning more, there’re plenty of online help resources. For Oxford university staff and students I recommend the IT services course on pivot tables.

Good data management practice in academia - What are your options?

posted 15 Oct 2015 by Adam Hendry

Big business takes data management seriously, funnelling millions of pounds into its efficient curation and mining. Geoffrey Moore (author of ‘Crossing the Chasm’ & ‘Inside the Tornado’) has tweeted that ‘without big data analytics, companies are blind and deaf, wandering out onto the Web like deer on a freeway.’ But how does good data management apply to academia and science in general? Scientists work with data daily, but is it managed in a way that maximizes its potential? That depends on what you mean by ‘good’ data management.

A good data management system has several advantages (Figure 1). Most labs don’t realize they are lacking an effective process until it’s too late. Here’s a simple question: If your senior postdoc left today, how much would you lose? You may have their results and protocols, but could you piece it all together? What about the experiments that didn’t work? And where are the reagents, compounds, plasmids and antibodies they used? businessdictionary.com defines good data management as the ‘administrative process by which the required data is acquired, validated, stored, protected, and processed, and by which its accessibility, reliability, and timeliness is ensured to satisfy the needs of the data users.’ Academic data management practices rarely fit within this definition. More commonly, academic data management is a combination of hand-written lab books combined with files stored on a server. What’s more the format of data entry varies from individual to individual. It’s a system that is likely to fail.

Advantages of using a data management system

Figure 1: Advantages of using a data management system

Assuming you want to set up a data management system in your lab what do you use? There are many options available (Table 1). The humble Excel sheet is a simple way to manage data, and most scientists are familiar with its features. However it lacks the robustness of more specialized software and access is limited to certain computers. A Google Docs based system has some clear advantages in its similarity to Excel and it being cloud-based (allowing a far greater degree of access). However like Excel, its simple and flat nature mean that comparing multiple projects is difficult and prone to human error. There is also no formalised standardizsation of the data.

Commercial software allows for searching across many projects whilst enforcing data standards. This in turn makes it extremely accessible to multiple users. It is also likely to provide additional tools to analyse the data. The obvious drawback is the ongoing subscription cost, which can be prohibitive for many labs. The final option is using free data management software. A good freely available package will provide most of the functionality of commercial software, but with less of the polish. For most academic labs, this is usually enough.

Excel Google Docs Commerical Software Free Software
Easy to implement
Requires time to learn
Data standardised
Data easily accessible
Dedicated Tech Support

Table 1: Pros and cons of different data management systems.

In reality a combination of data management tools is likely the most effective way to manage data. For example, Excel is a great tool for data input. Ensuring you input data into Excel in a machine readable way is an effective way to begin implementing good data management. This allows other data management systems (such as the ChemBio Hub platform) to read and interpret the information and categorise it accordingly. You can learn more about good Excel data management practices at Data Carpentry.

The ChemBio Hub Platform

Figure 2: The ChemBio Hub Platform

Whatever approach you take to manage your data, the most important thing is it works for you and your lab. There are clear advantages to implementing a system that will stand the test of time. Those who don’t risk major losses any time someone moves on from the lab, taking their knowledge about what exactly is in that freezer with them.

RDKit UGM 2015 - A ChemBio Hub perspective

posted 08 Sep 2015 by Paul Barrett

On 2nd-4th September 2015, Paul and Andy from ChemBio Hub attended the 4th RDKit User Group Meeting at ETH Honggerberg in Zurich, Switzerland.

The meeting was attended by around 50 RDKit users from across Europe and farther afield.

Greg Landrum provided the opening keynote talk, discussing how we can create better compound identifiers that accurately represent tautomers. Inchi based methods for registereing compound uniqueness do have problems, so a mechanism needs to be provided at all times for chemists to be able to specify that their susbstance is new. Specifying stereochem and tautomer information as metadata before providing a unique key was another useful tip. This was all pertinent to us as it mirrors debates we have had when deciding how the compound identifier system for ChemiReg should work.

Another useful tool discussed on the first day was propbox presented by Andrew Dalke. Propbox is a python table builder which allows dynamic calculation of chemical properties based on data in the table, it can also use the results of other dynamic property calculations to perform others.

Peter Ertl gave a very interesting talk about natural product likeness calculations. Natural products are substances such as antibiotics and alkaloids and are an excellent source of substructures for bioactive molecules. Peter has developed natural product likeness calculators trained on a set of 45000 natural products from open source with 1 million compounds from Zinc as non-np-like background for training.

Paul and Andy also gave talks at the end of the day - Paul presented an overview of the ChemBio Hub project to demonstrate how RDKit is being used and to introduce RDKit users to the ChemBio Hub codebase. Andrew showed how a plugin architecture can be built with the ChemBio Hub ChemiReg tool to provide automatic property calculation within results tables.

The post-talks dinner was hosted in central Zurich, a beautiful city and fantastic place to host debate around the topics of the day.

The second day of talks included a demo of some work done towards RDKit.js by Guillaume Godin, which looks to be a useful project, a talk by Riccardo Vianello about django.RDKit, which is a fantastic integration of RDKit functionality as directly-available Django models, and a talk by Samo Turk about hinge binder extraction from structures in the PDB for use as obtaining compound scaffolds to reveal compounds which would act on kinase hinge binder regions.

To close the day there was a round table discussion centred around communication within the RDKit community and how best to maintain contact, a request for RDKit to natively support Chemaxon chemical formats, and a friendly reminder that the PDB format is being deprecated - mmCIF format, which is xml based, is now preferred.

The final day of the UGM involved a “hack day” where a number of key improvements or enhancements to RDKit were identified and worked on. Andy got involved with helping to document the deployment process for RDKit involving packer and docker, following discussions with others who were interested in the process. Paul got involved with helping to convert the introductory documentation and RDKit cookbook from plain text to interactive iPython notebooks, to help novice users get to grips with using RDKit.

The UGM was a great opportunity to talk to a wide range of cheminformaticians from academia and industry and make useful contacts. We learned a great deal about how we can optimise the ways in which we can use RDKit within our own tools and spread the word about the projects we are working on.

One week to go! Current Expertise and Future Directions In Drug Discovery

posted 27 Jul 2015 by Michael O'Hagan

An Oxford-Industry conversation

We are making the final preparations for our symposium “Current Expertise and Future Directions in Drug Discovery: An Oxford-Industry Conversation” taking place in Oxford this Friday, 31st July.

We will be welcoming 70 industry representatives from over 50 organisations at Vice-President and CEO level, as well as 70 Oxford research group leaders from 30 departments. 50 post-doctoral researchers and DPhil students will be presenting posters, and we will also be joined by technology transfer professionals from Isis Innovation and specialists in knowledge exchange and business development. The tag cloud below shows the rich diversity of participants that will be there on the day.

Tagcloud of ChemBio Hub converation participants

The symposium will be a great place to have the right conversations that lead to new ideas and new research connections between Oxford and drug discovery groups from the Thames Valley region as well as national and international representatives.

After opening remarks from Professor Andrew Hamilton, FRS (Vice Chancellor, University of Oxford), we will hear 13 very different talks from speakers from 11 departments across the University with wide-ranging interests in drug discovery including phenotypic screening, target identification, disease pathway elucidation, clinical validation and more. We are also looking forward to hearing viewpoints from two industry specialists.

You can see the full programme here. Don’t forget to have your say on Twitter using the symposium hashtag #OxCBH.

After the presentation sessions we will have an open discussion between academics from Oxford and colleagues from industry. Directed by questions from the audience, the panel will explore how Oxford and drug discovery organisations can best work together to translate ideas into new medicines. If you are attending the event, why not prepare a question to pose to the panel?

The conversation will continue over an evening reception in the upper common room of the Andrew Wiles Mathematics Building. This will provide the ideal opportunity to make more connections over a glass of wine and stunning view of the Radcliffe Observatory (below). What better opportunity to arrange further discussions to explore new collaboration ideas?

Radcliffe Observatory

We hope you are as excited about this unique event as much as we are, and look forward to welcoming you in Oxford next Friday.

Pan-Assay Interference Compounds (PAINS) - what they are & why they matter

posted 10 Mar 2015 by Oakley Cox

PAINS compounds are a real issue which we faced as part of developing our ChemiReg application - here are some thoughts from Oakley Cox, a Chemistry DPhil student at the University Of Oxford.

Pan-Assay Interference Compounds (PAINS) are defined by their ability to show activity across a range of assay platforms and against a range of proteins. The most common causes of PAINS activity are metal chelation, chemical aggregation, redox activity, compound fluorescence, cysteine oxidation or promiscuous binding. Many PAINS have multiple functionalities, causing different types of interference and resulting in in vitro and in vivo activity.

Why are they a problem?

PAINS have been known for a number of years, and have been well documented in the past.1,2 Computational filters exist to remove known PAINS from chemical libraries and an experienced medicinal chemist will be quickly able to identify a PAINS-type structure. Even so, the scientific literature is plagued by publications containing ‘selective, potent chemical probes’ with clear PAINS-like structures (see Figure 1).

Compounds reported in the literature as bioactives, but which contain PAINS-like functionalities

Figure 1: Compounds reported in the literature as bioactives, but which contain PAINS-like functionalities.3-7

The pressure to produce impactful science on academics seems to be the most likely cause of PAINS publications. Biologists with little or no chemistry training are, understandably, unlikely to spot suspect structures, but know the inclusion of a small molecule ligand significantly improves the impact of their research. The prevalence of such research also indicates reviewers are not aware of the problem.

The problem is further compounded by the methods employed by academics to discover small molecule ligands. Commercial libraries are bought by research groups for convenience, meaning the same areas of chemical space are repeatedly explored. Computational PAINS filters are far from comprehensive and vendors still include many PAINS-type structures in their catalogues. Many researchers will knowingly screen PAINS to find valid starting points, citing the prevalence of PAINS features in approved drugs (around 7%) as evidence to back their strategy. Yet many of these drugs are special cases and were not developed using a modern screening triage.

How can the scientific community overcome the problem?

Every year, funders money is wasted following-up futile starting points or obtaining pointless intellectual property. Better awareness is needed of PAINS-like behaviour. Full characterisation and publication of PAINS if and when they are detected, akin to the work completed by Dahlin et al. earlier this year,8 would be a big step forwards. The results would quickly generate better understanding and improve cheminformatic filters.

Perhaps the most effective way to weed out PAINS publications would be to put pressure on reviewers and journal editors to spot problematic structures. It is relatively straightforward to spot PAINS (see Figure 2) and it would not be unreasonable for reviewers to ask for more rigorous evidence for an inhibitor to be described as selective and potent. At this point, it is important to emphasise the difference between rejecting a compound based on scientific evidence rather than simply dismissing a compound because it appears to be PAIN-like.

How a PAINS compound can be identified

Figure 2: How a PAINS compound can be identified.3

If you’re a researcher with an exciting new hit, how can you be sure it’s not a PAIN? Rigorous structure activity relationship (SAR) studies highlight the role of different parts of the compound for binding. Activity cliffs and nanomolar in vitro activity are good indicators of a genuine inhibitor. Synthesis of upward of 100 analogues would paint an accurate picture of a binding site as well as lead to improvements in both activity and selectivity. It may seem a daunting undertaking, but careful and well-planned SAR exploration can be both effective and attainable in an academic setting.

Oakley is a DPhil student studying at the Structural Genomics Consortium, University of Oxford. He is supervised by Dr Paul Brennan in the Target Discovery Institute (TDI) and is co-supervised by Prof Frank von Delft at Diamond Light Source. Read more here


  1. Walters, W. P.; Stahl, M. T.; Murcko, M. A. Drug Discovery Today 1998, 3, 160.
  2. Baell, J. B.; Holloway, G. A. J. Med. Chem. 2010, 53, 2719.
  3. Baell, J. B. ACS Med. Chem. Lett. 2015, Ahead of Print.
  4. Xin, M.; Li, R.; Xie, M.; Park, D.; Owonikoko, T. K.; Sica, G. L.; Corsino, P. E.; Zhou, J.; Ding, C.; White, M. A.; Magis, A. T.; Ramalingam, S. S.; Curran, W. J.; Khuri, F. R.; Deng, X. Nat. Commun. 2014, 5, 4935.
  5. Chen, F.; Liu, J.; Huang, M.; Hu, M.; Su, Y.; Zhang, X.-k. ACS Med. Chem. Lett. 2014, 5, 736.
  6. Evelyn, C. R.; Duan, X.; Biesiada, J.; Seibel, W. L.; Meller, J.; Zheng, Y. Chem. Biol. (Oxford, U. K.) 2014, 21, 1618.
  7. Nicolaes, G. A. F.; Kulharia, M.; Voorberg, J.; Kaijen, P. H.; Wroblewska, A.; Wielders, S.; Schrijver, R.; Sperandio, O.; Villoutreix, B. O. Blood 2014, 123, 113.
  8. Dahlin, J. L.; Nissink, J. W. M.; Strasser, J. M.; Francis, S.; Higgins, L.; Zhou, H.; Zhang, Z.; Walters, M. A. J. Med. Chem. 2015, Ahead of Print.

Introducing ChemiReg - The ChemBio Hub compound registration system

posted 10 Mar 2015 by Adam Hendry

The ChemBio Hub team is pleased to announce our new compound registration system – ChemiRegis now open for initial user testing. ChemiReg is the first step in our mission to capture all chemical biology data from around Oxford University. Following last December’s workshop (see previous blog post) we have developed the key functionality suggested by prospective users.

Current features of ChemiReg: - Flexible and easy to use - Compounds can be uploaded individually or by their thousands in minutes - Compound data is rigorously organised and easily searchable - Users can organise their data within their own project folders - Searching and exporting data is quick and simple - Secure login and IP protection

Upload of compound information is simple. You can draw your compound, copy and paste SMILES and InCHIs or upload whole files (we currently accept .xml, .sdf and .cdx formats). This flexibility is key to ensuring ChemiReg works for users from any background or discipline. We have developed functionality that is intuitive – to serve chemists and non-chemists alike.

The goal is not just to capture information, but to make it well defined and easily searchable. Organisation and presentation of data is therefore just as important as the quality of information itself, and ChemiReg meets these high data management standards. The project management system allows you to organise your data in project folders, and project security settings allow you to control who can see your sensitive information.

So why start with ChemiReg?

ChemiReg is our first step to capturing all of the University’s chemical biology data. Starting here means:

  1. Users will quickly gain benefit in managing their research data, without having to wait for other functionality to be available
  2. We can prove we have the right technology and approach – meaning what we provide next will be sustainable for the future
  3. Users of ChemiReg will be able to give their feedback on the sort of functionality we develop – and the community will shape the final set of tools available

We know from the positive response so far that we are on the right track. With more feedback and a larger community of users we will be able to deliver regular improvements to ChemiReg. We are also in good shape to face the challenge of developing assay capture functionality.

Back to the future

ChemBio Hub is developing quickly, and right now we need your help.

We are currently recruiting pilot users of ChemiReg.

We need researchers to use the ChemiReg utility to provide honest feedback on what they wish was (or wasn’t) there. If you would like trial access to ChemiReg, then please get in touch.

A number of users have already expressed their interest, a promising sign of things to come. Over the next few months we will continue to listen to your feedback and together shape ChemiReg into the perfect tool for organising and sharing your compound data.

The Future of Drug Discovery - Open Innovation

posted 19 Dec 2014 by Michael O'Hagan

Last week Oxford Biotech ran a great conference on “The Future of Drug Discovery: Open Innovation”. A top line-up of speakers from academia and industry gave their views on how open innovation drug discovery is developing and what the opportunities are for the future.

What is open innovation drug discovery…?

Much early stage drug discovery has traditionally taken place behind closed doors. Groups work to identify a drug target and find chemical structures that modulate its activity. As these groups are in competition, details of targets and chemical spaces under study are often left undisclosed. If a promising lead is found, it’s likely to be patented, granting the group exclusive rights over the intellectual property. This is understandable - it takes a lot of time and money to get to this point. Previous assumptions were that the patent protection gives the group time to turn the lead into a clinical molecule, recoup their investment and make profit.

But there’s a growing consensus that this might not be the best way forward, at least not all the time. Working in this closed way means that exploring blind alleys in one research organisation may be duplicated by another. Since so many drug candidates fail at the clinical trial stage (check out this paper) this duplication is clearly extremely inefficient.

Open innovation is that idea that certain stages of the drug discovery pipeline are carried out in a collaborative, non-competitive way. Sharing knowledge and know-how at the early stages reduces wastage and duplication of effort later on. Thanks to the pooling of resources from industry, academia and patient groups, the competitive process to develop a marketable drug can begin from a much more viable starting point.

So what did the six speakers have to say about open innovation?

From the University of Oxford, Professor Andrew Hamilton (Vice Chancellor) spoke of some key developments that cement the University’s role as a centre for drug discovery. These include the Target Discovery Institute, The Precision Cancer Medicine Initiative and the Kennedy Institute. Links between these cutting-edge research centres, the city of Oxford and wider industry will drive innovation in new medicine development in the coming years.

Next followed three speakers from the pharmaceutical industry. The theme of their talks? How their organisations are adapting to include open innovation in their business models.

Dr Trevor Howe spoke about Janssen’s Innovation Centres and JLABS. Based in the US, JLABS provide entrepreneurs with kit, infrastructure and management to bring innovative ideas to fruition. Dr Mark Whittaker (Evotec) argued that ‘closed innovation’ doesn’t always create value, as many small-molecule patent applications are not granted. Evotec now take novel concepts from academia and push them through their well-developed drug discovery platform. Dr Hitesh Sanganee discussed AstraZeneca’s Open Innovation web portal. Launched back in March, this is a one-stop shop for anyone to find out what compounds and targets are available for collaborative work with AZ. Already, innovative solutions to difficult problems have been posed from bright students and early-career researchers as well as seasoned professors.

Professor Birgitte Andersen then spoke about some of the work of the Big Data Institute in supporting and developing policy to turn the UK into a global innovation hub over the next decade.

Professor Chas Bountra gave the final talk - a tour-de-force of the success of the Oxford SGC open innovation model. In making all their reagents and know-how available, free from patents and IP, the SGC are able to collaborate extensively with academia and industry across the world. This access to a vast range of industrial groups, clinicians and patients has unearthed applications of SGC work to diseases far beyond the initial project focus.

ChemBio Hub: our place in open innovation

Professors Hamilton and Bountra were clear: Oxford is a leading centre for open innovation drug discovery in the UK. Our research generates a vast quantity of reagents, data and know-how that can drive innovation in the development of new medicines. ChemBio Hub will collect this knowledge into one place. Drug discovery organisations will have a single location where they can find the Oxford research they need to know about. We’re already talking to pharma and biotech about how they can become involved and what they need from the project. Ultimately, ChemBio Hub will connect the right people so that they can create mutually beneficial relationships. The effect? Increasing the role of Oxford research in innovation and speeding up the drug discovery process.

Workshop on recording and managing data related to small molecules - 9th December 2014

posted 15 Dec 2014 by Karen Porter

The ultimate goal for the ChemBio Hub project is to deliver a University-wide Chemical Biology website and data sharing platform. This will address problems researchers have in capturing and sharing all of their research data, knowing what previous approaches have been tried, finding collaborators, and ensuring they can efficiently manage the wealth of data that their research produces.

In order to start the process of delivering some simple, effective tools that go some way towards addressing these problems, the project team has decided first of all to tackle the problem of recording and managing information about the small molecules used in research. We therefore hosted a workshop to understand the main functionality users would need to have in an electronic system that would support this need.

This first workshop brought together people from various departments (SGC, Chemistry, Pharmacology, Cardiovascular Medicine and the TDI) with a variety of experience and needs. This was useful in allowing us to understand key features that are common to multiple groups. We will follow this up with more in-depth meetings with targeted users such as those whose primary focus is pure synthetic chemistry.

The key themes that we identified then were:

Initial data capture

This needs to be as simple as possible, without putting off potential users by demanding a lot of information that they may not know at the outset. It needs to allow addition of single molecules as well as processing large batches at a time.

The values which need to be captured (or generated) are:

  • A unique identifier plus synonyms for the molecule (including CAS Registry Number)
  • Its structure
  • What is known about its tautomeric and stereochemical forms
  • An identifier for the physical batch of the compound
  • Who made / bought it and why
  • How much was synthesised / bought and when
  • Its molecular weight
  • Salt / hydrate details
  • Who can see information about the molecule
  • Relevant Pan Assay Interference Compounds (PAINS) warnings
  • Intended targets or target class

Searching and filtering

Once data is in the system, it needs to be able to be found simply and comprehensively. As well as the basic information recorded above, searching needs to include:

  • Awareness of stereoisomers
  • Depending on what is needed we may want to specify that we are interested in search results related to a single enantiomer, or a racemic mixture, or all isomers
  • The ability to search by specific substructure or scaffolds
  • PAINS filtering
  • The ability to search by chemical fingerprints
  • The ability to apply successive CNS Multi Parameter Optimisation criteria

Batch-specific information

Over time, some batch-specific information may also be available related to screening compounds in assays, including:

  • Physical form
  • Location and amount
  • Plate and well identifiers
  • Bar codes
  • Solvent, volume and concentration of material
  • Known contaminants
  • Purity
  • Storage conditions

Physical properties

These may be known at the outset or might become available over time, but important properties to be recorded are:

  • cLogP
  • Polar Surface Area (PSA)
  • Mass spectrum
  • Melting point
  • 1H and 13C NMR spectra

System needs

There were some needs identified which are unrelated to managing the chemical and biological information. These are none-the-less very important to potential users. Those identified in this first workshop include:

  • Being able to change data held in the system but with a log of what has been changed
  • Limiting the values in some drop-down lists depending on which group the user belongs to and their recent activity
  • File attachments need to be handled simply and logically
  • Functionality which could show ‘publication readiness’ for a compound or group of compounds would be very helpful (i.e. showing which key values are in the system and which are missing)
  • Links to relevant safety information (this requirement came up after the workshop)

Later requirements

Workshop attendees agreed that the features above were the key needs in the first instance, but they had a number of ideas for further useful functionality which we would deliver later, including:

  • External data links (to ChemBL, SciFinder, Reaxys)
  • Links to commercial suppliers (such as eMolecules and Aldrich)
  • Publication references
  • Association of more detailed biological assay data (e.g. via PubChem)
  • Registration and management of macromolecules
  • Integration of several data processing tools into one single interface

What’s happening next

The ChemBio Hub team are now working on an initial tool to meet these needs. Before Christmas we plan to show users what we have done so far to check that we are on the right track. We will invite the people who attended the workshop, and those who planned to attend but were unable to. It will also be open to any others who would like to see what has been developed up to this point. We will do further work to release a tool that pilot groups can then test, collect more feedback and further tighten up the application before making it widely available.

If you have any questions or suggestions about the registration of small molecules, get in touch…!

Some thoughts from Open Con 2014: discovery tools

posted 09 Dec 2014 by Michael O'Hagan

ChemBio Hub aims to make life easier for chemical biologists at Oxford. We’ve already designed ChemBio Crunch a new app to speed up assay data processing. But sometimes we find out about cool tools that are out there already for people to use for free and we just need to recommend them! We don’t want to reinvent the wheel.

One thing I heard about at OpenCon is Sparrho. These guys realised that linear keyword searching doesn’t always turn up all the results you need from a scientific literature search. You have to make non-linear connections too. And that’s how it works – Sparrho asks users to rate its search results as relevant to their interests or not. These choices, aggregated and anonymised, allow Sparrho to build a bird’s-eye view of different research disciplines. It then uses this to recommend useful results beyond the confines of the search keywords. Clever stuff – try it out for yourself!

Some thoughts from OpenCon 2014: open data

posted 01 Dec 2014 by Michael O'Hagan

Last Wednesday I attended the OpenCon 2014 London meetup. I went to find out a bit more about how students and early career researchers view open data, and think about how this might fit in with the goals of ChemBio Hub. I also wanted to get an idea of what tools are out there to help people discover science and make their work more efficient!

Many aspects of open science were discussed during the day: open access, open peer review and open education all featured at various points. Open data was perhaps the most relevant topic to our work at ChemBio Hub, so here a few thoughts on this.

What does open data mean in the context of science? It’s the idea that experimental data should be available for everyone to use and republish as they wish. This includes being free from restrictions like copyright and patents. It means ALL data and not just the best results or good story!

Ross Mounce and Jon Tennant talked about the relevance of this to the biological sciences. They argued the benefits of open data include addressing reproducibility issues and building confidence in results. I liked their observation that a PDF is a document meant for reading, not reuse – and so it can’t be the vehicle for open data.

In the discussion session that followed people spoke about the challenges and opportunities of open data. One point that stuck in my mind was the need to develop infrastructure and standard methods for sharing data. Another was that effective data sharing requires particular skills that might not be routinely taught to early career researchers right now.

What about ChemBio Hub?

ChemBio Hub aims to become a University-wide resource for researchers to manage their compound and assay data and share it with collaborators. Our approach to this is to provide the right (you guessed it) infrastructure and training. But will all data in the Hub be “open”?

From talking to the research community, our view is that you should decide what data you share, when you share it and who you share it with. Our job is to provide the simple and secure tools for you to do this with minimal effort and fuss.

But “open data” is becoming a big thing for public research funding organisations. Some now mandate that data is made publicly available as a condition of their grants (check out the recent announcement from the Gates Foundation). The talks at OpenCon suggest a growing recognition of the value of open data, and support for it, amongst the scientific community. So as well as offering secure data management, ChemBio Hub will allow scientists to make their data public available when they choose – such as by publishing straight to ChEMBL or PubChem at the press of a button. Best of both worlds!

If you have any thoughts on open data or research data management, we’d love to hear them. Get in touch at our usual address: info@chembiohub.ox.ac.uk.

Beginning developing in Rails

posted 13 Nov 2014 by Paul Barrett

ChemBio Hub is committed to building a wide variety of tools and applications for scientists across the spectrum of chemical biology in Oxford. In order to do this we need to keep up to date in terms of programming techniques and languages.

An opportunity to do this came up recently when we needed an application to allow a brief walk-up survey at the recent ChemBio Hub symposium. It was decided that a simple yes/no system should be used to ask people about a small number of concepts which ChemBio Hub could provide applications or information for, which we could then narrow down based on the results of the survey.

Since it was a small application, it fitted well as a tutorial application for Ruby on Rails, additionally, it was similar to one of the existing official Rails tutorials at [link].

The spec for the application was as follows: - users should be able to walk up without entering any personal information to take the survey; - the survey should be a linear app, displaying a new question when an answer is given; - the survey should have a set number of questions (interests), each with a title, descriptive photo and help text; - each question should have multiple true or false (boolean) answers (preferences) associated with it, which make up the results of the survey; - on reaching the end of the survey, the user should be informed they have completed and the next person should use the same screen to start the survey.

I began by creating a new Rails app via the command line

rails new questionapp

I created objects called interest (the question) and preference (the answer). Each preference belonged to a single interest. An interest could have many preferences. I created these from the command line as directed by the Rails tutorial.

rails g model Interest title:string img_path:string help_string:text

rails g model Preference is_preferred:boolean interest:references date:timestamp

I created controllers for interests and preferences to handle the flow of input into the application. Again this was done via the command line.

rails g controller interests

rails generate controller Preferences

The key functionality of the app is located in preferences_controller.rb. When an answer is given, a preference must be created representing that answer and cause a redirect to the next question - or if this is the last question, direct to the completion page (which is also the start page!):

def create
  @interest = Interest.find(params[:interest_id])
  pref = false
  if params[:submit] == "yes"
	  pref = true

  @preference = @interest.preferences.create(is_preferred: pref)
  if @interest.next
    redirect_to @interest.next
    redirect_to completed_path

All of the code for this application is available at . You may fork this code as you wish and use it in your own applications or for learning purposes.

The first ChemBio Hub Symposium - what happened?

posted 13 Nov 2014 by Michael O'Hagan

On 10th November 2014 the ChemBio Hub team held a cross-departmental symposium at the Saïd Business School in the centre of Oxford. Attendees from a broad range of departments in the chemical and biological sciences met to hear about cutting-edge research projects happening across the University. This started lots of useful conversations that generated ideas for new research collaborations.

We decided to use Twitter as a way to reach out to those who weren’t able to attend the symposium. We chose a hashtag (#OxCBH) in advance and publicised this to participants. This allowed us to use Storify to collate the tweets and images from the event - perfect for embedding into our blog. Take a look at what happened below. If you want to know more, just get in touch with us!

ChemBio Hub 2014 Symposium walk-up survey results

posted 11 Nov 2014 by Paul Barrett

The recent ChemBio Hub symposium - entitled Chemical Biology across the University of Oxford - was a great multidisciplinary event bringing together PIs and researchers with an interest in chemical biology. They represented at least 15 departments. The presentations throughout the day sparked a number of conversations and ideas in the breaks and beyond. You can find out more about how the day progressed in this blog post and you can also find the posters from the day here.

ChemBio Hub set up a survey to capture interest in aspects of chemical biology research. We used a Yes/No question system on a touchscreen to make it appealing and easy for people to take part. A lot of people were very willing participants - thank you again if you took part!

The results from respondents on the day were as follows:

Validation and Cleansing of raw data

Validation and Cleansing of raw data

Combining and Comparing Datasets

Combining and Comparing Datasets

Curve Fitting

Curve Fitting

Visualisation of Assay Data

Visualisation of Assay Data

Image Analysis

Image Analysis

Data Sharing

Data Sharing

Assay-specific tools e.g. qPCR or ELISA

Assay-specific tools e.g. qPCR or ELISA

What do these results mean for ChemBio Hub? - It’s clear that visualisation of assay results is a key requirement of a great many chemical biology researchers - and that image processing is less of a priority for Oxford’s ChemBio community. We will use the survey results, along with other feedback, to determine the order of priorities that we work on in future.

Did you attend and not have chance to take our survey? Or do you otherwise have a strong wish to see one of these areas prioritised? Let us know in the thread comments below.

OOMMPPAA - directed synthesis and data analysis tool

posted 22 Oct 2014 by Anthony Bradley


The ambition of this work is to help chemists to decide which compound to make next. Currently a chemist will use the available protein structures and activity data, find trends in this data and use this to inform future compound design. When I started my project it became clear there were two ways this is currently done.

Screenshot of Anthony Bradley's OOMMPPAA tool

Screenshot of Anthony Bradley's OOMMPPAA tool

Subjective visualisation

On one side scientists would look at bound ligands and activity data seperately and manually. They would then juggle the two sides in their mind and use their experience to generate hypotheses and suggestions on which compounds to make next. Clearly this mode of working is highly subjective and reliant on decades of experience.

Black boxes

On the other side computational tools would take in all this data and spit out, using complex algorithms, an answer of what compound to make next. Largely these would be seen as black boxes. If they worked great - if they didn’t what have you learned?

A third way?

It struck me that tools could be applied to work with the chemist. They could run through ever enlarging datasets and condense and visualise this data to allow for easier and more objective analysis. They could highlight conflicting aspects of the data and features the chemist may have missed. In this way experienced chemists could work faster and inexperienced ones could work at all.

What does OOMMPPAA do?

OOMMPPAA uses 3D matched molecular pairs to contextualise both activity and inactivity data in its relevant protein environment. It then identifies pharmacophoric transformations between pairs of compounds and associates them with their relevant activity changes. OOMMPPAA presents this data in an interactive application providing the user with a visual summary of important interactions in the context of the binding site.

What doesn’t it do?

It is important to note - OOMMPPAA does not predict anything. It doesn’t itself extrapolate from the data. It doesn’t train a model. It doesn’t do any machine learning. It simply shows the data, as it stands, and highlights the key features within it. We have tips as to how to get the most out of it - but the most important thing is that you use it

How can I use it?

Integral to all of this was making the tools user friendly and easy to install.

  • Firstly you can trial the tool online - we even made a demo!
  • Secondly windows users can download the tool using our three click windows installer. This has full functionality and allows the user to process their own datasets
  • Thirdly you can get the full source code. Licensed under Apache. Pull it from bitbucket. Branch it. Merge it. Use it. We’d love anyone to get involved.

More details about all of these options are available here

How can I learn more?

If you want to know more about the tool. How we’ve used it. How it works in more detail go here. If that’s still not enough, email me at oommppaa.help@gmail.com.

Anthony is a DPhil student at the University of Oxford. He works between the Structural Genomics Consortium, Statistics and industrial collaborators GlaxoSmithKline. Anthony develops computational tools to aid drug discovery. Learn more here.

UX in Scientific Software

posted 02 Oct 2014 by Paul Barrett

On Tuesday 30th September 2014 Karen, Andrew and I attended a UX Oxford talk by Roman Pichler (@romanpichler) entitled “UX and Scrum”. The slides have been made available here. In the talk there were a number of interesting ideas put forward relating to UX (User Experience) in terms of project planning - when to start thinking about UX planning, ideas on how UX can fit an agile/scrum workflow and how to reliably keep track of and advance UX plans and ideas.

Concepts from the talk

One interesting idea was that of the persona, complementary to a user story. This describes a target user type in more detail, outlining what they hope to get out of the system and how they may use the system, rather than describing one specific linear process. Roman has constructed a template for building personas which can be found here. More information on personas in project planning can be found here.

Another concept was that of the Product Canvas. This image is taken directly from the slideshare presentation linked above.

The idea is to list personas next to high level ideas and functionality, epics and any other “regular” aspect of project or sprint planning, as well as work to be done in the next sprint or fully visualised UX designs. This canvas layout is intended for the early stages of planning but is useful for keeping track of initial good ideas which can be lost due to early difficulty in implementing or lying slightly outside the scope of the project at the start.

Attending this talk was useful in forcing us to think about how UX was important in the work we do. For myself in particular, it made me think about UX applied to scientific software in general.

What makes bad UX for scientific web based software?

Simply saying that command line software, like a lot of bioinformatics tools, does not have good UX and should be put into a web page is not correct. The fact that such software continues to be used and preferred by many scientific researchers speaks volumes - they find the user experience good enough to provide them with the data they need. Command line interfaces also provide an easy way for communicating help and usage information quickly - an instant message to someone with a 30 character command can get someone up and running (or out of an awkward situation) easily. In summary there is a danger in confusing user interface with user experience.

A lot of problems for web based scientific software come from trying to directly replicate the command line experience and options in a web form - dropdowns and checklists everywhere. The functionality is the same but the user experience is not.

An assumption that users of the software are experienced in using scientific software is also a pitfall. Web based scientific software should enable non-technical users to use the tool and obtain consistent reproducible results comparable with more experienced or technical users.

Lastly, an assumption that users have a computer or other system capable of displaying recent technologies is also a recipe for poor UX. This is becoming less of a problem with tools like Modernizr and developers are adding more fallbacks to their code but it is still a potential problem.

What makes good UX for scientific web based software?

Firstly, good UX is not simply about adding eye candy or new technology - this can help but is not the be all and end all. There has to be a good reason for adding something like this.

There are some things that can enhance UX of scientific software.

  • Clear instructions - Having a monolithic help page lifted from a Unix man page or an entire wiki dedicated to your software application does not mean it is clear how to perform simple tasks, leading to frustrated users. Contextual help, simple paragraphs at the start of sections, small help snippets for individual form fields can all help explain how to use your software and minimise frustration.
  • Example datasets - Having your users know exactly what the system will and will not handle in terms of data formats and files can also contribute towards a good user experience. A good way to do this is to have example dummy data
  • Better charting engines - lots of options here such as D3.js that can create interactive charts, narrow down data and display datasets in different ways. You can often also write your own plugins and charts for these if the functionality or chart formats do not exist.
  • Helpful erroring - at some point a user will try and enter terrible or corrupted data or put files in the wrong field. Mysterious 404 error pages or stack traces are not the correct way to handle this of course. Having the system helpfully suggest the action to take to correct this, with an explanation why the data is bad may help the user experience.
  • Presets and profiles - scientific software often has a lot of different options for data parameters, which can take a while for the user to tweak and configure. Having a system which has sets of sensible defaults for commonly used settings for the tool can speed up the process for the user and enhance the user experience.

Do you have any thoughts on UX in scientific software? Get in touch with us, we would love to hear your thoughts and how they could be applied to software we write.

ChemBio Crunch - A simple tool to analyse and manipulate ChemBio assay data

posted 30 Sep 2014 by Andrew Stretton

The eventual goals of the ChemBio Hub project include enabling bench scientists to submit their biological activity data to public databases and to share it with specific people. When faced with this challenge an obvious research informatics solution would be to create a data repository and to then address the processes required to get data in there.

We decided the needs of the bench scientist had to come first in order to build a community around ChemBio Hub

Data is copied and pasted between different tools and Excel templates, calculated results may then be stored in a shared drive, database etc. Advantages of this approach include: - Familiarity and good documentation for the tools used - Low learning curve compared to programming - Flexibility, easy to adapt to changes in experimental setup

Issues that arise from this approach include: - Copy and paste errors leading to incorrect data - Management of template files in the group can be difficult - No consistent output data format -Duplication of effort between scientists and groups - Mixing of data and processing logic makes output hard to use

As we believe that there must be a better way - ChemBio Crunch was born

There are 4 simple steps to use ChemBio Crunch 1. Upload raw data files 2. Validate plate signals and remove outlier wells 3. Calculate IC50, hill slope and relevant errors 4. Export as XLSX, png or PPT

Results can then be compared using the publication-ready charting features of ChemBio Crunch

Setting up the ChemBioHub blog

posted 02 Sep 2014 by Paul Barrett

As a part of our commitment to provide quality collaborative tools and outreach to the chemical biology community, we have set up this blog to provide some additional information about how we are creating new tools, helping researchers and enabling collaboration.

We had a look at a number of options for the technology platform we could use for the blog. It had to be something that was lightweight and also editable by the less-technical members of our group. The big CMS platforms, such as Wordpress and Drupal, provide good WYSIWYG (what you see is what you get) tools for text entry but are too heavyweight and complicated for what we wanted to achieve. Also considered was Ghost, a lightweight blogging platform written in javascript and node.js - while this fulfilled the remit of being lightweight, and allows users to add content via Markdown, it required another hosting solution to be set up and configured.

A good compromise was found with Github Pages and Jekyll. Jekyll is a lightweight, blog-aware platform which allows users to draft posts for review, publish them, use templating and all the things which developers expect. Plus it has the advantage of being hosted and run on Github - this enables us to have an open source, transparent blog structure that other users can take away and use themselves! Because it uses Markdown, the less-technical members of our group can easily add blog entries for review before they are published and gain an insight into how version control via Github works. That helps the developers explain their work better!

Setting up the blog was a breeze thanks to the great documentation and existing sites that could be used as examples. The most useful of these was Github Rebase[]() with easy to follow code and simple structure.

So now we have a blog, we need to fill it! There will be more articles added here by different members of the team, if you have questions for any of us, just click the author name and send us an email!