Every Programmer has its Own Toolbox: jUtils

[Young Worker With his Toolbox - courtesy of http://www.joli-petite.com]Just to make an example about the title of this post, searching for ‘java utils’ on github yields 3056 results, I checked just a few of them and all were commented like ‘my personal utilities’. It’s a common experience: despite the existence of projects like Apache Commons or Guava, every programmer ends up building its own small code-base of handy functions and components, intended to be used in many different projects, rather than being linked to any in particular.

Reasons to do so vary. Hopefully, a decent software developer tryes to leverage on existing code as much as possible. Yet, almost always, one at some point is in that situation where that that tiny useful bit is needed, a tiny bit that will likely be re-used in other, potentially completely different projects, and maybe not only by yourself. String manipulation, I/O, regular expressions are the first areas coming to my mind that trigger such a need. Ignorance and hubris aside, sometimes is actually worth to make such libraries public and advertise their existence around.

I am no exception (ignorance and hubris probably included…). My personal lib of utils is now a quite public project, jUtils (part of ISA-Tools) and the code in it is not mine any more (i.e., I’m not the only one to be blamed…), since it contains a few contributes by Eamonn Maguire too. In this post, I’d like to give a brief overview of what jUtils offers. Who knows, that bit you were about to code yourself might be described in the next paragraphs.

Fiddling with Strings and Regular Expressions

An instance of the uk.ac.ebi.utils.regex.RegEx class allows you to compile a regular expression pattern once and re-use it over many match checkings (i.e., an handy cache). Additionally, it simplifies calls to the Java regular expression library. For instance, you may find it useful to collect all the groups in a match into an array, rather than using the call-by-index approach available by the Matcher‘s SDK class. it can be used like this:

RegEx re = new RegEx ( "foo.*", Pattern.CASE_INSENSITIVE  );
assertTrue ( "foo didn't match fool!", re.matches ( "fool" ) );
assertFalse ( "foo matched fox!", re.matches ( "fox" ) );

RegEx re = new RegEx ( "(.*):(.*)" );
String input = "First:Second";
String groups[] = re.groups ( input );
out.println ( "I have the groups: " + ArrayUtils.toString ( groups ) );
assertNotNull ( "Group matching returns null", groups );
assertEquals ( "Wrong no of returned groups!", 3, groups.length );
assertTrue ( "Wrong value #0 in returned groups!", input.equals ( groups [ 0 ] ) );
assertTrue ( "Wrong value #1 in returned groups!", "First".equals ( groups [ 1 ] ) );
assertTrue ( "Wrong value #2 in returned groups!", "Second".equals ( groups [ 2 ] ) );
assertTrue ( "Wrong result for matchesAny()!",
RegEx.matchesAny (
  "A test String",
  Pattern.compile ( "foo" ),
  Pattern.compile ( "^.*TEST.*$", Pattern.CASE_INSENSITIVE ) )

As you can see, there is another small useful method, matchesAny(), you can figure it out what it does. By the way, if you only have to search for a set of simple patterns in a string, the methods containsOneOf() and containsOneOfIgnoreCase() in the class StringSearchUtils are certainly faster than regular expressions, since they use String.contains() and the commons StringUtils.containsIgnoreCase().

A Bit of I/O

I use IOUtils.readInputFully() when I need to put all the contents of a file (or any other input stream, such as connections) into a string. There are several occasions when this is useful, for example, think of displaying instructions, README files or copyright information. The method is also a base for readResource(), which you can use to read a file located inside the jar of your package (or anywhere else in your class path). It makes use of the class loader asssociated to the class it receives as parameter (via getResourceAsStream()).

getHash() is one of the many implementations of a function to read an input stream completely and compute a hash signature out of it. getMD5() are wrappers of this more general method.

DownloadUtils comes handy whenever you need to download a remote URL-accessible document into a local file.

Who Are You, Generic?

Have you ever written a Data Access Object (DAO)? Usually, such classes come in the form:

public class MyEntityDAO
  public MyEntityDAO ( Class<extends E> ec ) {

so, you’ve to make it dependent on a type of generic (E) and, at the same time, pass a class of the same type, which will be used to create new instances of ‘E’. Usually, such a class is passed in the DAO’s constructor, in addition to implicitly referring to it via E, because there isn’t an easy way to do new E(): E is not really a class. Cases like this are where the methods in the ReflectionUtils class come to help. getTypeArgument() receives a generic-declaring class C<T>, a sub-class of C, such that D extends C<T1>, and tells you which concrete class T1 was bound to T by D. I copied the code for such methods (and added some bits) from an excellent post by Ian Robertson.

Playing with Collections

Several of the functions in the collections package are found in more popular libraries as well. I’ve implemented such functionality either before they were available elsewhere, or before I became aware of their existence. Either way, there are slight differences and if you’ve already imported jUtils for something else, you have the collection stuff already with you, without any further external code.

ObjectStore implements a double-key map. For instance, if you need a small store of customers, orders and products, you can use an object store where the types are Customer.class, Order.class and Product.class, the keys are emails, order numbers and product bar codes. Guava’s Table interface allows you to do the same thing, although based on the different abstract concept of a sparse matrix, where keys are identifiers of rows and columns. Our implementation is very similar to the one you find in the Guava’s HashBasedTable concrete class.

With ListUtils you can access any index of a list, without having to check its size. If you access an index bigger than than the list size, you’ll get back a null, if you add an element beyond the current’s list size, the list will be expanded by filling the gaps with nulls. This kind of behaviour may or may not be what you want, often it makes the code easier to write, but potentially at the cost of being more error-prone. Apache Commons has GrowthList, which implements similar functionality. Using their version of it, you need to either instantiate a new, growable list type, or to decorate a list you already have in your hands. Our less object-oriented approach, where you access and existing list via static methods can be quicker.

In the field I work in, bioinformatics, classes like AlphaNumComparator are often needed. Suppose you want to order a list of strings like “sample1.compound1”, “sample10.compound2”, “sample2.compound1” etc. If you use the ordinary string comparator for such a list, you (or your users) probably won’t like the result, since ‘sample10.compound1’ comes before ‘sample2.compund1’ based on the character-by-character comparison criterion. You need to split your strings into truly string chunks and string chunks that are actually numbers, so that you can use number-based comparison for the latter. This is exactly what AlphaNumComparator does. There are many different implementations of this idea around, each with its own flavour. We have taken this, others can be found in this StackOverflow question.

TypeCastCollection exposes a collection bound to the super-type T1 as a collection bound to the sub-type T, (i.e., T extends T1). This is useful when you need to pass Collection<T> to an existing method. The same can be obtained by means of the approach described in this post:

Collection<T1> csup;
Collection<T> csub = (Collection) csup;

However, our wrapper does more type-checking. In fact, the second statement in the code above compiles without complains, no matter the type of csub. You’ll discover a problem with incompatible types only at run-time and of course, it could be tricky to figure it out what’s going on. Using our wrapper, a type-mismatch problem can only occur when you fetch every element with a (wrapped) iterator and only if your original collection contains an element of type T1 but not of type T, i.e., in a situation where generating a run-time error cannot be avoided anyway. For the moment, we don’t have implementations specific to other types of collections (lists, sets, maps etc), they would be all based on wrapping many methods declared in the standard collection library with trivial syntax formality. You’re welcome to make something like that for the collection type you need and please send it back to us!

Rule the Tests

JUnit rules is one of the nicest discoveries I’ve made about JUnit. It’s a simple yet powerful mechanism (and, I guess, inspired to aspect-oriented programming) to change the normal behaviour of your tests without making to much mess in the tests themselves. For instance, you can use TestEntityMgrProvider in all those tests that need a JPA EntityManager to perform database-related operations. You just need to define an instance of TestEntityMgrProvider and this will be automatically initialised and disposed of before and after invoking test methods. Oh, I hear you, how is that different than @Before/@After? The difference is that you isolate this initalisation/shut-down behaviour in a single class and you make that class available to any test that needs it, even over multiple projects. The @Before/@After way would force you to mess-up your test code with entity-manager related code and likely would produce code duplication (and I know what I’m saying by experience).

TestOutputDecorator is another (a bit more advanced) example of the JUnit's test rules power. The class prints out a header and a trailer before and after any test, reporting the name of the test class and the method being run in a readable way. Something like:

=-=-=-=-= testRemove(uk.ac...ObjectStoreTest) -=-=-=-=-=
Retrieved Object: Object 1.2
Entry 'Object 1.2' successfully deleted
=-=-= /end: testRemove(uk.ac...ObjectStoreTest) =-=-=-=

The decorator can be used as a test-rule (typically in already existing projects where only part of the test classes already implement such a feature), but JUnit, Maven and Surefire make it even simpler: you can just set TestOutputDecorator as a test listener in the Surefire plug-in configuration in a Maven project and all the tests will be nicely decorated! Of course jUtils itself makes use of such feature!

And More

I’ve written checkMemory() in MemoryUtils when I was having some memory leaks, caused by an object that was supposed to live as long as the application lifetime. I could have debugged that, but was a 3rd party class, so it was quicker to destroy the instance from time to time and recreate a new one. Yeah, but from time to time, when? Too often, you waste time, too rarely, you risk memory overflow. If no other strategy is available, you can decide to kill the bad guy when the problem it is causing can safely be ignored no more: i.e., when the JVM’s reported free memory falls below a given percentage of the total available memory. This is what checkMemory() does, you can pass it a Java Runnable, to do whatever you need in your particular memory-stealing situation. Because very often it is wise to follow-up such an action with some memory cleaning, the garbage collector is invoked to you in this method.

Last tiny bit, XPathReader simplifies the usage of XPath API by caching the XML documents against searches are performed, after having parsed it from an XML input stream.

That’s All

At least for now, I expect more to be added in future. Collecting this variety of functions in a single package has the disadvantage that you makes it dependent on many different third-party packages (usually a utils package itself doesn’t grow at such a big speed). Well, at some point one has to refactor, creating utils-io, utils-strings etc. Will do in future if necessary. For the moment, enjoy!

Click to rate this post!
[Total: 1 Average: 5]

Written by

I'm the owner of this personal site. I'm a geek, I'm specialised in hacking software to deal with heaps of data, especially life science data. I'm interested in data standards, open culture (open source, open data, anything about openness and knowledge sharing), IT and society, and more. Further info here.

Leave a Reply

Your email address will not be published. Required fields are marked *