Recently, in my team, we have managed to start periodic talks/discussions on software engineering, including best practices, project management tools, code style, testing and all that jazz. If you are in a large-enough team in a full software house, probably you’re thinking: well, this kind of meetings should be pretty standard in this business. Well, but not in small teams that do bioinformatics research we do all sorts of activities, from writing grant proposals to Agile-style (or kind of) software development, teaching and more.
Anyway, what follows are a few notes I’ve used for one of those meetings, where we have talked about various day-to-day practices on code style and writing programs that are efficient, easy to read and maintain.
As I try to say when I’m asked to talk about these things, I definitely think software engineering is not actual engineering, for it’s more a mix of engineered techniques plus much craftwork, creativity and luck. That’s why, at least in small teams, I don’t want to be too prescriptive of nuances like indentation or naming. So, you’ll read a number of things that are at the crossing between reasonable and personal taste.
I hope you’ll enjoy it. Alternatively, I also have a rude version of it.
Summary
- File names and file organisation
- Naming in general
- Comments
- Braces
- Other code style aspects
- Exceptions
- Language features
Table of contents generated with markdown-toc
File names and file organisation
-
Use the language conventions, eg, in Javascript,
io-utils.js
, orioutils.js
, notioUtils.js
-
Don’t use generic names,
ui-utils.js
, or the Java packagecom.foo.ui.utils
, is better thanutils.js
(orcom.foo.utils
) -
Don’t put everything on the same file. Dually, don’t put just one bit in a single file, but the former is more important.
Space or Tabs?
- Indenting with spaces vs tabs is an historical ferocious war in the domain. So, I usually accept that people indent the way they want, and so do I. That’s because they have different pros/cons. Tabs allow for adjusting the size on your editor, but:
- they’re a burden on certain occasions (eg, looking sources on github via the web browser)
- mixing tabs and spaces results in messed files
- As said, I try to be liberal, but I also try to follow the choice I find in existing files
- Sometimes I reformat for my own readability (sorry)
- I usually avoid to change things like formatting when I’m working on something like a quick patch on a git branch. That’s because such non-functional changes add overhead to the later merge, so I do them at the end, possibly on the main branch.
‘-‘ or ‘_’?
I used to be an underscore guy for naming files containing multiple words/bits. However, a few years ago I discovered that search engines prefer dashes. So, I started using hypens, but certainly I haven’t got obsessed with it, if I’ve already a tons of files using underscore, I just leave them as-is. I think that definitely we all should be liberal on it. However, be consistent, don’t mix the two styles on a new directory.
Arranging program units
I usually organise the units in a file in a "logical top-down“ order, ie, first the most top-level, most abstract and public units, then what they use for their own implementation, lower level units, private units, eg,
// Elements shared in the file/module, define them first, same for class fields or function
// variables
var debugFlag = getConfigParam ( "app.isDebuggingMode" )
// the most abstract unit
def abstract class LoginWidget extends Widget:
...
// concrete, but still top-level
public class SSOWidget extends LoginWidget:
...
ldapRec = oauthHelper ( userid, passwd )
// Low-level, private
def private oauthHelper ( userid: string, passwd: string ) -> LDAPRecord
...
Naming in general
-
Use the language conventions and team conventions
- even more so for dynamically typed languages (Js, Python)
-
Meaningful names, eg
- eg, don’t be too generic:
for i in a: doSomething ( i )
vs:for product in products: updatePrice ( product )
. Sure, we use ‘i’ and ‘n’ all the time, but avoid it in non-trivial cases - eg, don’t be specific when the code is actually generic:
function fixLabel ( l )
, when the function is actually doing:abbreviate ( string )
. In a case like this, it make sense to definefixLabel()
as a function that just callsabbreviate()
. It makes it clear what you want to do with the former, it makes theabbreviate()
available for other uses (instead of locking its implementation insidefixLabel()
), and makes changes tofixLabels()
transparent (eg, one day you can decide that the fix isabbreviate()
plustoLowerCase()
)
- eg, don’t be too generic:
-
Common names and prefixes/postfixes
iXXX
, orixxx
for iteratorsxxxCt
,ctXXX
,xxxSz
,xxxSize
for counters and sizesisXXX
ordoXXX
for flags, toggles, options- when needed,
configFilePath
for the pathconfigFileName
for the name only,configFile
for the object result
for the variable that will eventually be returned (often I prefer a more meaningful name)checkXXX()
,validateXXX()
doXXX()
,processXXX()
XXXConverter
,XXXAdapter
,XXXProxy
,XXXException
,XXXStream
- Be aware of design patterns and try to follow the terminology they use
- Be consistent with existing code, eg, if all the classes in a hierarchy are named like
Widget
,LoginWidget
,SSOLoginWidget
,LocalLoginWidget
, createLDAPLoginWidget
, notLDAPChallengeUI
Comments
-
There is this theory that comments are the last resort, another example
-
True, but it shouldn’t be used as a motivation to write good code, not an excuse to not write comments!
- Define good names
- Define clean, compact, not entangled code
- follow the single-responsibility principle and other SOLID principles (plus other best practices, design patterns, etc)
-
Write things in order of detail, eg, generic 1-sentence description, longer description, including
edge cases and the like, parameter and result description (not needed when they’re obvious),
implementation details. Example:# Sorts the users. # # By default, it uses the natural order comparator, but a custom {@link Comparator} can be used # alternatively (eg, {@link RoleBasedUserComparator}). # # @param users: a list of User objects. The sorting is done in-place, ie, the list will be changed # by the function. Raises an exception if null. # # @return users itself, just for convenience # # The sorting is based on a parallel version of the quick sort algorhitm, which uses the number of # available system cores minus 1. This can be changed via application configuration (@see TODO) # def sortUsers ( users: List
, comparator = NATURAL_ORDER_CMP, asc = True ) -> List ... -
Write comments that make things clearer, don’t state the obvious, eg,
// Return the read configuration return configuration
-
I use the third person on top of class or function definitions ("it does this and that"), I use the imperative form in implementing code ("Read the configuration if it’s still empty")
-
As you can see in the example above, try to use the tags from an auto-documentation system (eg, JSDoc for Javascript, Javadoc for Java). In particular,
- authors add people to ask to in case needed
- date give an idea of when a unit was created or updated
@link
and alike allow for navigation (in IDEs and web)
-
Other common taglets:
TODO:
,DEBUG
,WARNING
-
Don’t leave comments around that lie! If the code has changed and now it mismatches what a comment says, update the comment too.
-
Don’t comment code! There’s git for that! It makes sense on a very few exceptions, eg,
- temporary disabling something that will be re-enabled soon
- Commenting and leaving a comment like: TODO: do we still need it?
Braces
-
Another aspect where I’d like to remain liberal, but sometimes I reformat the code since I can’t read it. In particular, when mixed indentation or bracket styles have been used and some editor around
has messed up with it, shows messy indentation and the like. -
New line or not before the first brace? I am weird on this, since I think mixing the two is the
best. Ie,
try
{
// 20 lines of code, I prefer the opening bracket in a new line
}
catch ( ex ) {
// Just two lines, new line isn't worth it
throw new ValidationException ( ... )
}
- But not sure at all it’s the best, I don’t want to enforce it, it isn’t automatable (actually it is, but I’ve never seen any editor implementing these criteria).
-
Try to not write too long blocks/functions/etc. 5-10 lines are already enough. When it gets longer, try to split into multiple functions or alike. Also, check duplication/redundancy (eg, the horrible if/elseif, see the open-close principle)
- If you need to have long blocks, add comments at the end, eg,
for ( String line: file.readLines () ) { ... // 40 lines of code } // for line
- This helps readability when the begin of block is scrolled out of view.
- That’s even more useful when you have many levels of nesting. But avoid it (see below)
- If you need to have long blocks, add comments at the end, eg,
-
Avoid too much nesting, this is very ugly:
{
button.onClick ( function () {
if (...) {
...
if {
...
if {
...
for (...) {
...
if {
...
}
else if {
...
}
else if {
...
}
else {
...
}
}
}
}
}
});
}
- It should be refactored, split into multiple functions, the if() flow should be reviewed (see below)
-
Also, have a column size limit (mine is 120, it used to be 80)
-
Prefer code that avoids nesting. A classic example:
// Nested version
def doSomething ():
if <guard1>:
<block1>
if <guard2>:
<block2>
if <guard3>:
<block3>
return <value>
else:
return null
else:
return null
else:
return null
#
# More compact version
def doSomething ():
if not <guard1>: return null
<block1>
if not <guard2>: return null
<block2>
if not <guard3>: return null
<block3>
return <value>
- I prefer to skip braces when not needed, eg,
if ( <guard> ) {
return null;
}
//
// I prefer this (without or with the newline, depending on the final statement length):
if ( <guard> ) return null
- That’s not mandatory and WARNING, it’s dangerous if you aren’t well aware of what you’re doing:
if ( <case1> )
if ( <case1.1> ) doCase11()
else
doCase2()
//
// From the indentation and the names, we can infer that the syntax needs to be fixed like:
if ( <case1> ) {
if ( <case1.1> ) doCase11()
}
else // so that the compiler/interpreter doesn't see this as an else of the internal if()
doCase2()
Other code style aspects
Spaces in expression and statements.
I usually put a space before and after almost any element:
public Pair<Double, Double> solve_2nd ( a: real, b: real, c: real )
{
try {
double delta = sqr ( b ^ 2 - 4 * a * c );
double x0 = ( -b + 4 * delta ) / ( 2 * a );
double x1 = ( -b - 4 * delta ) / ( 2 * a );
return Pair.of ( x0, x1 );
}
catch ( MathException ex ) {
throw new MathException ( String.format (
"Error while computing solve_2nd ( %s, %s, %s ): %s", a, b, c, ex.getMessage() ),
ex
);
}
SELECT name, surname, id as order
FROM customer JOIN order ON customer_id = customer.id
- Not mandatory, but it often helps.
- Spaces on the sides of long sub-expressions (
/
in the example above) are more important than other shorter expressions or single values (4*a*c
) - Spaces after open bracket and before closed bracket are less important than the rest
Splitting into multiple lines (and using indentation)
I try to follow the semantics of what I’m writing, to keep together those bits that are logically .together. Of course, in addition to the line length (roughly kept at 120).
I try to indent by following the semantics.
In the examples above:
- new line started with
FROM
(SELECT
) - when throwing the exception, all the things about the constructor are opened in the same line,
then the parameters in the next indented line, a final line with the bracket closes the indentation.- Most of the time, I also put
String.format (
on the next line (all about the first parameter), but here it would suck up too much space that I prefer to use for the message
- Most of the time, I also put
Splitting chains into multiple lines
In these cases, I apply the rules above:
// every chain step in its line, first step in the same line
return names.filter ( n -> n != null )
// usually I indent through all the chain
.filter ( n -> !"".equals ( n ) )
// when the parameter is a block, use indentation and close at the same invocation level
.map ( n -> {
var prefix = n.substring ( 0, 4 );
var accession = n.substring ( 4 );
return Pair.of ( prefix, accession );
})
.collect (
Collectors.toCollection ( LinkedList::new ) // new line when too long
);
Once more, those are my preferences, there are several other reasonable variants.
Exceptions
-
DO NOT catch exceptions when you can’t do anything specific with them. In particular, DO NOT log or report exceptions (
catch (ex) { log.error (...) }
, unless you’ve something peculiar to report. That’s the job of some UPPER LEVEL and you can’t know how it is being done! That is, a web container might send its logs to a configured file, or a remote location, a command line tool might report everything on stderr, etc. See the long discussion in the comments here (started by Mirko Leonardo) -
Those who write empty
catch {}
blocks will go to the most painful circle in Hell, where they’ll be forced to provide Windows 95 assistance to biologists, and for eternity.ex.printStackTrace()
is very slightly better (see below).
-
Catch exceptions if and only if:
- you can do something to recover or change behaviour (eg, I/O error during config reading, log it and fall back into default config)
- it is useful to re-throw a more specific exception (eg, throw a
ConfigError
when you got anIOException
orFileNotFoundException
while reading config). Also, use a more specific message if that’s the case, which reports useful parameters (eg,throw ConfigError ( "error while reading config from '$configFilePath'", ex )
.
-
ATTACH the original cause (exception) to re-thrown exceptions! The stack trace is VERY useful for debugging
- Low-level details might be disturbing for the end users, but it’s up to the UI code to manage their own exceptions properly
-
Ensure you have some upper catch-all code that reports exceptions (eg, to the end user, to the system logger) and logs the entire stack trace. This might be split into a log event at ERROR level that logs the message only, plus a DEBUG level event that reports everything.
- If you don’t define it explicitly, it’s often the language (or framework, or alike) that does it for you, but be sure
-
DO NOT hide exceptions from reporting, except in trivial cases (eg, re-tries). If an exception is very low level and easy recoverable, consider logging it as a DEBUG or TRACE event.
-
Unless a few exceptions, DO NOT LET THE REGULAR FLOW TO CONTINUE AFTER AN EXCEPTION. Eg, if some variables have to be filled with the data from a failed database connection, you risk to continue with null values, or an inconsistent state, which could cause further errors or even security breaches.
-
Consider utilities, eg, jutils, NoException (a dependency of jutils).
Language features
If not too tricky, use the language facilities. Examples below.
- Ternary operator:
String foo;
if ( <condition> )
foo = "value 1";
else
foo = "value 2";
// this is more compact
String foo = "value " + ( <condition> ? 1 : 2 )
- Avoid redundant constructs:
if ( <condition> == true ) // == true isn't needed (in most cases)
return true;
else
return false;
// This is *much* more compact
return <condition> // :-)
-
Use boolean short-circuits:
if ( optionEnabled && validateOption() ) doOption()
In many cases, evaluating a plain variable is faster than evaluating an entire function. Operators like&&
have been designed so that they stop (short-circuit) the evaluation when the boolean operation they’re involved in becomes certainly true or certainly false.- In this example, if the option isn’t enabled, the time-consuming validation isn’t invoked at all.
- This helps with performance, but might also be relevant for the semantics. Eg,
validateOption()
might trigger an error when the option isn’t enabled. - Also, note that exploiting this feature leads to code that is more compact than two nested
if()
- In the opposite case, where you need all the elements of a boolean expression to be evaluated unconditionally, use the corresponding non-short-circuit operators, eg,
if ! ( drawHeader() & drawPanel() ) log ( "Some drawing failed, see the logs" )
-
Use the language helpers, including third-party functions. Eg,
if ( map.get ( key ) == null )
v = map.get ( key );
else
v = DEFAULT;
//
// Use the java.util.Map helpers
v = map.getOrDefault ( key, DEFAULT );
- Use modern paradigms and APIs, eg, as functional programming (examples in Java, Python, Javascript), Java streams, map/reduce, futures and completable futures (also available in Python, Javascript promises are a similar concept).