The BioInvestigation Index and ISA Tools are out!

[The BII Architecture]

After a long time me and other brilliant guys from the NET Project team have been working, finally we have released a public version of the BioiInvestigation index and binaries for local installation! (Sources coming soon, just give us time to clean up/document etc.).

What is it?

The BII project provides an infrastructure for storage ‘n retrieval of multi-omics experiments (we call them studies), i.e.: studies which of typical design is: prepare a sample and take out different measurements on it, such as microarray data, proteomics, 2D-gels, sequencing etc. We aim at keeping together the meta-data about these experiments (experiment design, sample characteristics and preparation), as well as leveraging on existing omic-specific repositories and format (e.g.: ArrayExpress + MAGETAB, PRIDE + PRIDEML).

How does it work?

We cover the whole pipeline consisting of prepare-submission/submit/search-stored-information. Submissions can be crafted by means of the ISATAB format, a tabular, spreadsheet-based format, which is a compromise between the need for an (end)user-friendly format and the need for something decently structured and formal (a similar approach was followed for the definition of MAGETAB). Things are further eased by the ISAcreator, a graphical Java tool that works similarly to Excel (or OpenOffice Calc, I don’t like to mention MS only…), with the difference that ISAcreator has many more nice ISATAB-specific features. For instance, one of the most interesting things is that it connects to ontology servers (OLS or Bioportal) and allows you to select the right annotation terms for your submission.

How am I involved?

I am mainly in the layer that does input/output with ISATAB and other formats. The code on this side is able to read an ISATAB-based submission and map/store it into the BII-model, the object/database model that reflects the format and is roughly similar to FUGE. The validator tool and the loader tool are based on such code.

We are going to release the Converter, the counterpart that converts our ISATAB-based model to omic-specific formats, i.e.: you can produce a MAGETAB submission (or a PRIDE one, or, in future, ENA or your-favourite-format-here), so that you can send omic-specific parts of your multi-omic study to the pertinent specialised repositories, and have them linked in the BII, the unified access point.

Customise/integrate your IT infrastructure and data

There are several points where you (well, mainly your computer geeks) will be able to put your hands to customise our stuff to your needs. You can configure where the omic-specific data are dispatched and stored. You can configure ISAcreator to define specific requirements that are considered important to better describe your kinds of submissions. For example, you can decide which controlled vocabulary is to be used to fill a field like "Biological Characteristics". The Import Layer code, which will be released the next weeks, should be quite reusable as well (although, at least initially, you will need to know Java programming). For instance, it has been written so that it’s relatively easy to map an ISATAB submission into a new format for some kind of specialised data. You can do that in a pretty declarative way, so that most of head-aching with programming should be avoided. This is a bonus coming from the way we’ve designed the code, which has been allowing to flexibly write the loading part in parallel with the task of defining the ISATAB format, which of course has changed several times before reaching the current, more stable version.

In short, this means we have a code base that would be quite useful for writing a similar tabular-to-objec-model converter and the way back. More details when we release it too. Meanwhile, please have a look at this introduction.

Try it!

So, that’s almost all. Since you have read up here, maybe you also want to try the thing. Check it out here and let us know by joining our newsgroup!


The project is funded by the FP6 grant named Carcinogenomics, which aims at developing computational models for testing the toxicity of chemical compounds. These would be an alternative to the use of animals, which, in addition to being good for the animals, is going to reduce the testing costs. The latter has become particularly important for the European industry, after the EU parliament has passed the REACH directive on the testing policies that are required for putting chemical products on the European market (which essentially is about more severe tests in the interest of better safety for the public).

Click to rate this post!
[Total: 0 Average: 0]

Written by

I'm the owner of this personal site. I'm a geek, I'm specialised in hacking software to deal with heaps of data, especially life science data. I'm interested in data standards, open culture (open source, open data, anything about openness and knowledge sharing), IT and society, and more. Further info here.

Leave a Reply

Your email address will not be published. Required fields are marked *