So, I came back at the EBI in March. I am now working at the BioInvestigation Index project (formerly BioMAP), for which I brushed up the good old Java.
What is it about?
In short: we aim at integrating the submission and access of transcriptomics data, proteomics data and metabolomics data. We will leverage on the EBI existing repositories, ArrayExpress and PRIDE. We are also developing a submission tabular format, ISA-TAB, which is inspired by MAGE-TAB and FUGE.
Although we won’t be tied to specific a biological domain, one of our goals is to support the Carcinogenomics European project, which addresses the creation of in-silico models for toxicity tests. These will offer an alternative to the employment of animal testing. A relevant part of the project is about the use and development of controlled vocabularies and ontologies.
What am I doing specifically?
At the moment I am mainly working on the submission tool component. The idea is to use a tabular, spreadsheet-like format for the input of multi-omics data. Instances of such a format, which at low level is a set of Tab-Separated Value files (TSV, also called CSV), may be produced by means of spreadsheet applications, like MS Excel or Open Office, or in a programmatic way (e.g.: data exchange puorposes). We plan also to develop an interactive tool, which will probably based on Excel, and which will be inspired by PRIDE-Harvest.
I am currently working on the import code, which parses the TSV files and maps them into our Java object model. In order to do that in a flexible and reusable way, I view the tabular format in a way similar to a relational database, and I am arranging the mapping task in a way similar to Hibernate mapping. More details in this presention.