Welcome to the ChemBio Hub blog, a place for you to find out more about how we operate at a technical and collaborative level and how you can get involved. You can of course visit the ChemBio Hub website.

Workshop on recording and managing data related to small molecules - 9th December 2014

posted 15 Dec 2014 by Karen Porter

The ultimate goal for the ChemBio Hub project is to deliver a University-wide Chemical Biology website and data sharing platform. This will address problems researchers have in capturing and sharing all of their research data, knowing what previous approaches have been tried, finding collaborators, and ensuring they can efficiently manage the wealth of data that their research produces.

In order to start the process of delivering some simple, effective tools that go some way towards addressing these problems, the project team has decided first of all to tackle the problem of recording and managing information about the small molecules used in research. We therefore hosted a workshop to understand the main functionality users would need to have in an electronic system that would support this need.

This first workshop brought together people from various departments (SGC, Chemistry, Pharmacology, Cardiovascular Medicine and the TDI) with a variety of experience and needs. This was useful in allowing us to understand key features that are common to multiple groups. We will follow this up with more in-depth meetings with targeted users such as those whose primary focus is pure synthetic chemistry.

The key themes that we identified then were:

Initial data capture

This needs to be as simple as possible, without putting off potential users by demanding a lot of information that they may not know at the outset. It needs to allow addition of single molecules as well as processing large batches at a time.

The values which need to be captured (or generated) are:

  • A unique identifier plus synonyms for the molecule (including CAS Registry Number)
  • Its structure
  • What is known about its tautomeric and stereochemical forms
  • An identifier for the physical batch of the compound
  • Who made / bought it and why
  • How much was synthesised / bought and when
  • Its molecular weight
  • Salt / hydrate details
  • Who can see information about the molecule
  • Relevant Pan Assay Interference Compounds (PAINS) warnings
  • Intended targets or target class

Searching and filtering

Once data is in the system, it needs to be able to be found simply and comprehensively. As well as the basic information recorded above, searching needs to include:

  • Awareness of stereoisomers
  • Depending on what is needed we may want to specify that we are interested in search results related to a single enantiomer, or a racemic mixture, or all isomers
  • The ability to search by specific substructure or scaffolds
  • PAINS filtering
  • The ability to search by chemical fingerprints
  • The ability to apply successive CNS Multi Parameter Optimisation criteria

Batch-specific information

Over time, some batch-specific information may also be available related to screening compounds in assays, including:

  • Physical form
  • Location and amount
  • Plate and well identifiers
  • Bar codes
  • Solvent, volume and concentration of material
  • Known contaminants
  • Purity
  • Storage conditions

Physical properties

These may be known at the outset or might become available over time, but important properties to be recorded are:

  • cLogP
  • Polar Surface Area (PSA)
  • Mass spectrum
  • Melting point
  • 1H and 13C NMR spectra

System needs

There were some needs identified which are unrelated to managing the chemical and biological information. These are none-the-less very important to potential users. Those identified in this first workshop include:

  • Being able to change data held in the system but with a log of what has been changed
  • Limiting the values in some drop-down lists depending on which group the user belongs to and their recent activity
  • File attachments need to be handled simply and logically
  • Functionality which could show ‘publication readiness’ for a compound or group of compounds would be very helpful (i.e. showing which key values are in the system and which are missing)
  • Links to relevant safety information (this requirement came up after the workshop)

Later requirements

Workshop attendees agreed that the features above were the key needs in the first instance, but they had a number of ideas for further useful functionality which we would deliver later, including:

  • External data links (to ChemBL, SciFinder, Reaxys)
  • Links to commercial suppliers (such as eMolecules and Aldrich)
  • Publication references
  • Association of more detailed biological assay data (e.g. via PubChem)
  • Registration and management of macromolecules
  • Integration of several data processing tools into one single interface

What’s happening next

The ChemBio Hub team are now working on an initial tool to meet these needs. Before Christmas we plan to show users what we have done so far to check that we are on the right track. We will invite the people who attended the workshop, and those who planned to attend but were unable to. It will also be open to any others who would like to see what has been developed up to this point. We will do further work to release a tool that pilot groups can then test, collect more feedback and further tighten up the application before making it widely available.

If you have any questions or suggestions about the registration of small molecules, get in touch…!

Back to top