So, what’s bioinformatics about?

This post was inspired by one of the best students I’ve had the privilege to supervise in my career. As it happens, it started with a few questions in a chat, which were essentially like: since you’re a senior in this field (ie, I’m getting old, personal note…), I’d like to ask you some advice on what job is best, should I do a PhD, what are the other options (education, training, etc) to work as a bioinformatician, what about salaries, success chances, challenges, etc, etc.

After having had to delay a well-thought and comprehensive answer, I decided that maybe it was worth organising my notes on the not-so-easy topic and writing a public post, so that it could benefit everyone.

So, what is being a bioinformatician like? What are the available career options? Should you do a PhD? Is it an academic career, or is there an industry for it?

I’ll try to answer based on my experience and knowledge, which is biased and isn’t extremely widespread. Indeed, apart from collaborations, personal reading and the like, I’ve been working mostly in the UK, only in academia, and mostly on a particular area of the whole field, which is developing software to manage and access life science data. Here you’ve a summary I agree with.

Anyway, let me consider a few aspects.

What’s Bioinformatics about?

I’m gonna answer this question assuming you already know something about it, eg, you got an interest in bioinformatics, you have read some introductory material about it, like papers or books, possibly, you’ve had some experience, like in college or alike. If not, please start from there, maybe I’ll add links later.

The main areas in which I think we can subdivide the field are as follow.

Computational biology and data science

As the title says, this has two main sub-areas. One is the only thing that bioinformatics used to be in its early years (between the late 1990s and early 2000s), that is, developing algorithms and methods to process the (much) data coming from the biotechnologies. Sequence alignment (eg, BLAST), gene expression analysis, statistics for clinical experiments, systems biology (and biochemical models that use math tools such as ordinary differential equations, or Petri Nets) are a few examples.

Is it good? Is bad? It has a lot of number crunching, it requires quite good knowledge of statistics, and also quite good proficiency in biology. Math subjects like operative research or graph theory are also involved. It has much programming too, but not much of the kind of programming that is typically required to develop software at industrial level (see below).

I put data science within here too, meaning what has been becoming very popular in recent years, both in life science and any other domain: machine learning, artificial intelligence, using data science resources such as SciKit Learn or Spark and the clouds. There is some distinction between this and what I’ve just named computational biology, but not enough to keep them very far apart.

I mean, AI methods are general enough that you can apply them to a domain you don’t know much, and still achieve some significant results. However, if you really want to master them in the life science domain, you’d better know life science at a good level, and you should also be proficient enough with statistics and other math topics. You don’t need to know the details of traditional computational biology, but it helps to have an understanding of it. For instance, it’s good to have a good idea of what you’re getting when fetching orthologue genes from a resource like ENSEMBL.

Oh, by the way, this AI-based data science is cool and a hot topic nowadays, it’s a powerful means to make exciting discoveries or develop industrial applications (eg, in clinics), and IT developments will make it even more powerful.

Flip sides? It’s harder than it looks on the surface, the quality of people and companies involved in data science varies a lot, ranging from those who have solid competence and know what they’re doing, to those who are chasing the hype, no matter how bad the data they have or produce are, or the way they apply these methods is out of mind.

Data Management

This is quite close to half of the things I do in my job. It’s about working on collecting, sharing and publishing data, making them easy to access, useful and re-usable for search, exploration or analysis. All of this allows for maximising the benefits that life science data produced every day yield, including making scientific discoveries without relying entirely on wet labs, or even developing industrial applications, eg, to power AI-based diagnostics in clinics or agronomy. Experts in this area know the FAIR principles, data representation methods, such as data models, ontologies or formal logics, existing data standards and resources (repositories, APIs, organisations, people). Data curators (of experimental data, ontologies, repositories), data stewards and data managers are typical job titles that fall within this field.

Usually, hopefully, they have good skills and experience regarding the social aspects of this sub-domain: collaborating with stakeholders having heterogeneous competencies and needs, knowing how to go from discussions about requirements to working products that people actually want to use, good skills in communication, presentation and even negotiation (try to spend a week around the kind of JSON we should use to represent a gene or an experiment, and you’ll know what negotiation means…).

This activity is interesting especially if you want to mix rather diverse competencies, from the specific knowledge in biology (which happens more in areas like data curation) to the social skills said above (which are used more on the rest). Also, there are chances to program and do rather technical/geeky things, such as defining an API or an ontology. The latter in particular, offers fascinating opportunities to explore areas of computer science that are related to philosophy (eg, formal logics) and linguistics (eg, when thinking of text mining tools).

Roles in this area are also cool if one feels the "mission" of sharing data, making data more useful, and contributing to important scientific goals, such as reproducibility or open access to public data.

A disadvantage is that it has its sources of frustration. For instance, it can be very tiring to discuss for hours, to agree even on as simple things as which tables a database should have, or how to name the columns in a spreadsheet. As another example, a new AI algorithm for drug discovery might give more satisfaction and rewards than spending months cleaning, curating and formatting the data that the algorithm needs.

Software development

This is much of what I do. As the title says, this is about developing software that biologists use. It can be software that is focused on computational biology or analysis, so that this kind of job might overlap with the things mentioned above. Or, it can be more similar to traditional software engineering, with the peculiarities of this application domain. Regarding the latter, it is certainly useful to have at least an idea of what life science is and how it works, though there are life science software developers who know little about it. Regarding the software engineering aspect, it is quite similar to many other software engineering businesses, you deal with requirement analysis and design approaches, project management methodologies (especially Agile), coding best practices, code management tools (such as github, but a number of others too).

Conceptually, working on all those things is substantially different than working mostly individually on a script for some analysis, which often is one-off job. For it’s about software that is usually complex in the horizontal direction. That is, usually, it’s not about single very complex algorithms, rather, it’s about putting together different needs and components, writing easy-to-read code that can easily be changed and extended over time, developing performant and scalable software, adaptable to multiple running systems, plus various other needs that together makes this activity as stimulating as complex. Moreover, my opinion is that, together with the technical and problem-solving skills, the first soft skill all of this requires is being a good communicator and team player, understanding others, even those who know nothing about computer science, being able to explain things clearly to them, being good at writing technical documentation, being good at communicating clearly through the code (eg, simple functions, self-explanatory names, little confusion about how to use the parameters and which output to expect).

If you like technical stuff, software engineering can be great in general, and cool in the life science domain. In particular, it’s cool to combine two sciences, computer science and biology, which are so different from each other, ie, one is very mathematical, tends to generalise, to find universal rules to be applied in specific cases without exceptions, the other is more empiric, more about exceptions and approximations than universal rules.

Another advantage of developing software is that its skills learned in the life science context (or in academia) are easier to possibly reuse in practically any other domain, including the private industry. The same applies to data science too, especially AI, but in my opinion, software engineering experience is more generalisable and reusable.

A disadvantage of this kind of job is that, if you are keen on biology, developing software for biologists takes time and commitment, and you’ll have fewer opportunities to work specifically on biology tasks. For example, you might be involved in a specific research project about, let’s say finding insights about the genetics of a disease, but typically, you’ll have less time to work on the biological details of it, and you’ll need to spend more time with bugs or discussing the design of a component with your development team.

Another possible issue is that developing software for science and academia tends to be less valued when considering academic careers, ie, getting publications might be harder, and consequently, in a system that, sadly, is still largely based on the publish-or-perish obsession, becoming professor, or even post-doc, can be harder. The same might be true for data curation roles (but I don’t know them very well).

Then, there are the difficulties of this kind of job: in addition to be good at programming and designing software, as said above, it’s a lot about communication, it requires a lot that you interact with others, keep yourself up-to-date (articles, books, courses, conferences), sometimes fixing a bug right in time for a release can be a nightmare that sucks your energy and sleep. After all, facing all of this is acceptable if you have some interest in the good part. I’d say a passion too, but at least some interest is the minimum.

One thing I feel worth to add about this point, is that, in my opinion, the quality of both software and developer skills is degrading, especially in the domain of scientific software. Regarding this, contributing to best practices and the like is one of the interesting things in this type of job. For instance, see how the software carpentry organisation works on such a goal (and, a similar initiative dedicated to data carpentry).

 Other careers?

There are several other options in bioinformatics, which overlap with the things mentioned above. For instance, teaching and training, managerial and commercial roles (eg, in grant and tech transfer offices), outreaching activities (eg, science journalist). If interested, I let the reader explore these jobs, I don’t know much more about them.

Where should I start from? Do I need a PhD? Is a master enough? Or what?

My impression is that practically, your education needs at least the master level to have good chance of starting any job in any bioinformatics area. But I might be wrong and you never know the routes you meet in life. The best education that might land a job in bioinformatics is, of course, something named bioinformatics-something, but other degrees are valuable too. For instance, all biology-related courses (including chemistry, medicine), computer science-related courses. Also, statistics and mathematics, and anything about data science are relevant. Subjects like economics might be relevant for commercial roles. Even some humanities degrees, such as phylosophy, might be interesting in areas like ontology development. Apart from these obvious titles, if you’re really interested in the field, you should explore the possibilities and you shouldn’t be discouraged by having a background that on paper, doesn’t look as much related. For instance, I have a friend who got a degree in anthropology and developed her graduation thesis in a hospital, working on something named "medical anthropology". Before meeting her, I had no idea something like that exists.

OK, but should I do a PhD?

Ah, yes the PhD…

First, if you want a research career, becoming a scientist in an academic institute, professor or something similar, I don’t think there are many opportunities without a PhD, though there might be in the private industry, where they have less formalised career tracks in research and development departments.

That said, you should ponder if you want to embark on the endeavour of a 3-year PhD programme having in mind a research career, or primarily for the sake of it. A PhD is a demanding journey. You study fantastic stuff from the frontier of research and you learn how to apply knowledge to contribute (minimally, but that’s still great) to building new knowledge, research applications, and improving people’s lives.

And then, it has its ups and downs, and its deal of pains. Mainly of two types. One, doing research is difficult. You need to be good at problem-solving, at wondering questions, at reasoning over: what if? What could be the cause of this? How can I verify it? To face this aspect without compromising your mental and physical health, you need to be really passionate on research and on a particular subject. And you need force of will, strength, resilience. This article says more on this.

Two, the system can be awful and near to legalised slavery. There is an array of horror stories about PIs who don’t follow their PhD students, people forced to impossible hours, bullying, work published without proper credits and authorship. Plus, a sad reality is that the system nowadays has many low-paid positions and the top levels (long-term contracts as researchers, PI roles, professorships) are much limited and difficult to land (a famous reading on this).

On the other hand, in most countries, having a PhD in bioinformatics is valuable, especially because it gives people a unique set of problem-solving and soft skills. So, it’s not that I’m advising against doing it, for me, it has been one of the best and most useful things I’ve done in my life. However, as said above, passion and will are even more necessary to face these social, system-related issues. Also, because of them, I’d recommend that deciding to start 3 years of a PhD programme should primarily be motivated by the opportunity of learning and doing unique things, and you should be open to multiple career paths, not just the academic one.

Finally, be very careful in choosing the programme, the institute, the PI and the group where to do it. A great mentor and a sane environment vs a damn slaver and a toxic group can make a lot of difference between Heaven and Hell.

So, what should I do, then?

First of all, thanks for all the interest and patience that led you to read up to this point. Likely, I still haven’t answered your fundamental questions, ie, what is it best to do? Which job is more interesting? Which one is easier, giving more economical stability and success?

Probably you’ve already guessed it: I can’t answer these questions, for none can, except you. That’s why instead, I’ve tried to provide a list of possibilities and food for thought, which is a partial and subjective list, based on my experience, which is quite long, but as limited as any individual experience can be, and opinionated.

You should explore these possibilities on your own, ask other people around. Possibly, find a mentor to explore different topics. It doesn’t need to be a formally-nominated mentor, just some senior-enough figure, who is willing to have a chat and give pointers from time to time and for a while.

There is one important thing that I want to touch, but which is actually fundamental. The main point here, more than the many details of this or the other job, is a couple of subjective aspects:

  • Ideals and values: what you want to do with your life? What do you value more? Money? Doing interesting things? Succeding in overcoming challenges? Changing the world?

  • Personal preferences and attitudes: what do you like to do? More technical stuff? More interaction with people? Writing and presenting? You need communication and cooperation skills in pretty much anything you do in a high-skill job, certainly in anything about bioinformatics or IT, but, as explored above, a bit more in certain roles. And more: do you like more research, more hands-on and implementation activities, more predictable activities, something about teaching or commercialising technologies or intellectual property?

  • Personal needs and wishes: where do you want to live? How much balance you want between private life and work? Do you want to focus on a family? Sooner? Later? Do you think career ambitions shouldn’t be sacrificed for family ambitions, and vice versa? In many European countries and organisations there is much attention to ensure people can balance these things, but sometimes you find people who are disgraceful on them. Be aware that you need some additional effort to find and maintain such a balance, more in certain careers than in others.

Moreover, consider the specific bioinformatics domain you might be interested in. Examples are base research, applied research, industry, human health, agronomy, environment.

Finally, a few things I’d like to reiterate: I’m no oracle, gather multiple opinions, do your own research, and finally make a decision that is fully yours, balancing heart and mind. In doubt, explore more options, try to keep more than one door open.

And of course, good luck.

Click to rate this post!
[Total: 4 Average: 4]

Written by

I'm the owner of this personal site. I'm a geek, I'm specialised in hacking software to deal with heaps of data, especially life science data. I'm interested in data standards, open culture (open source, open data, anything about openness and knowledge sharing), IT and society, and more. Further info here.

Leave a Reply

Your email address will not be published. Required fields are marked *