My Maven Archetype for Java Command Line Projects

Allegory of the Cave, C v Haarlem, source: Wikipedia
Allegory of the Cave, C v Haarlem. Source: Wikipedia.

A window manager is a computer application that eases the use of multiple shell sessions at the same time.

I found this (not-so-much) joke on a geek forum (where else?), at the long-ago times of usenet. It highlights
very well how great the good old command line interface can be, if you can use it.

Geeks do many things from that "black thing" (this is how my girlfriend refers to it) that could be done
from a GUI, partially because of hubris, partially because it can be so much practical.

But the most important thing about tools based on the command line interface (CLI) is that you can do much
more with it and by combining multiple tools through mechanisms like piping. Often, they are the only way
to completely automate tasks, for instance in cases like continuous integration or data processing
pipelines.

In this post, I’ll show how you can create Maven projects that spawn command line tools, so that
your users can easily use them to run a Java CLI having complex argument parsing and management of
dependencies.

Everything is based on a real project, which provides a Maven archetype, ie, a template, to build
your own CLI Maven projects, without the need to start from scratch every time. If you’re just interested
in reusing that project of mine, skip this article and go to the above github link, but looking the details are
more fun and could be useful to craft your own CLI tools.

Summary

Command Line Tools in Java

As pretty much any other programming language, Java offers mechanisms to invoke a program via command line,
including passing arguments to it (ie, parameters) returning an operating system exit code to the caller and
linking .jar or single .class dependencies.

If you’re reading this article, probably you already know the basics: once you’ve defined a class, let’s say
Hello.java, which contains a conventional static main() method, you can invoke the program by simply typing:

$ java com.foo.example.Hello --opt1 Argument1 --opt2 Argument2 

However, that’s the simple case. In real life, things are more complicated:

  • You typically want your CLI command to support a rich set of --options, which includes validating the syntax,
    auto-generating the --help output from the option descriptions, collecting the options specified in a
    given invocation, to be used in the task that your program implements. And, since this parsing is very generic, you
    should want a dedicated and reusable library for it.
  • Typically, you can’t just invoke your Hello, you need a number of third-party .jar files to be passed to the java
    command, via the -cp option.
  • This makes the command lines based on java invocation long and hard to write, so you also will want to
    craft a script that allows for a much simpler invocation, something like:
    ./hello.sh --opt1 Argument1 --opt2 Argument2, with all the rest hidden in the script.
  • You also want to write such script so that it supports features like invocation from any path (without any
    need to cd into the home of your CLI tool), picking configuration files, setting environment variables
    like Java debugging options.
  • And you need to put together your program, .jar files and the launching script into some easy-to-install
    distribution archive, .zip or alike.

Let’s do it!

The picocli library

Fortunately, many of the points above are wheels that you don’t need to reinvent.

Let’s start from the first point: there are many CLI option parsers around. Recently, I’ve had much fun with
picocli, which supports the definition of a line command through intuitive and
easy-to-use Java annotations.

Here it is an example:

// A picocli command needs to be implemented as a Callable
@Command ( 
  name = "runme", description = "Command Line Example.", 
  mixinStandardHelpOptions = true
)
public class Hello implements Callable<Integer>
{
  // A picocli --option
  @Option ( names = { "-n", "--name" }, description = "Try passing me your name" )
  private String name = "Dude";

  // The positional parameters (remaining after - or --options)
  @Parameters ( description = "The command parameters" )
  private String[] params = new String [ 0 ];

  @Override
  public Integer call() throws Exception
  {
    out.println ( "Hello, " + name + "!" );
    out.println ( "These are the params you sent me: " );
    for ( String param: params ) 
      out.print ( param + "\t" );
    out.println ();
    return 0;
  }

  public static void main ( String... args )
  {
    // Just delegate the CLI processing and execution to the picocli command
    int exitCode = 0;
    try {
      var cmd = new CommandLine ( new Hello () );
      exitCode = cmd.execute ( args );
    }
    catch ( Throwable ex ) {
      ex.printStackTrace ( System.err );
      exitCode = 1;
    }
    finally {
      System.exit ( exitCode );
    }
  }
}

As you can see, you can define a CLI command as a Callable class and define its options and other details by
means of Java annotations. Then, the picocli framework is able to parse a set of actual CLI argumenmts
and hook them to your class’s fields. Other features are available too, such as the mentioned
generation of the --help output (ie, if the above class is invoked with java --help, or even java --blah,
a description of the command and its options is printed).

The Maven Assembly plug-in

So, we have a cool way to easily define the syntax of a line command. As said above, now we have to put together it
with all the rest, dependencies, launching scripts, default config files, etc.

Maven is a good way to do it. As you probably know, this is a software build system, which can automate a lot about
building a piece of software, including downloading its third-party dependencies, running automated tests, generating the final
project artifacts, such as a .jar or a .war.

Maven is organised through many plug-ins, each serving a particular task. The Assembly plug-in is one of the most used, as it is
so useful with packing entire self-containing applications with all the files they need into a single archive file.

To use Maven and the Assembly plug-in, let’s start from defining a simple CLI project. This is a Maven ‘jar’ project
type, where the Assembly plugin uses the sources and other files to build a final distribution .zip, the archive
your users can download and install by unzipping its contents into some directory (where a directory like
cli-ref-project/ will be created).

Here it’s a skeleton of the project structure and the POM:

cli-ref-project
cli-ref-project/pom.xml
cli-ref-project/src/test/resources/log4j2.yaml
cli-ref-project/src/test/java/uk/ac/ebi/example/AppTest.java
cli-ref-project/src/main/assembly/pkg.xml
cli-ref-project/src/main/assembly/resources/log4j2.yaml
cli-ref-project/src/main/assembly/resources/run.sh
cli-ref-project/src/main/java/uk/ac/ebi/example/App.java

And this is the POM:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>uk.ac.ebi.maven</groupId>
  <artifactId>cli-ref-project</artifactId>
  <version>1.0-SNAPSHOT</version>
  <description>This is a test project used as a base for cli-archetype.</description>

  ...

  <dependencies>
    <dependency>
      <groupId>info.picocli</groupId>
      <artifactId>picocli</artifactId>
      <version>4.5.2</version>
    </dependency>
    ...
  </dependencies>

  <build>

    <finalName>${project.artifactId}_${project.version}</finalName>
    ...
    <plugins>
      ...
      <!-- The package for line commands is built through this, details later in the post -->
      <plugin>
        <artifactId>maven-assembly-plugin</artifactId>
        <configuration>
          <!-- Arranges the permissions of the final archive -->
          <archiverConfig>
            <fileMode>0755</fileMode>
            <directoryMode>0755</directoryMode>
            <defaultDirectoryMode>0755</defaultDirectoryMode>
          </archiverConfig>
        </configuration>
        <executions>
          <execution>
            <id>pkg</id>
            <phase>package</phase>
            <goals>
              <goal>single</goal>
            </goals>
            <configuration>
              <finalName>${project.artifactId}_${project.version}</finalName>
              <appendAssemblyId>false</appendAssemblyId>
              <attach>true</attach>
              <descriptors>
                <!-- All the details on how to make the distribution zip -->
                <descriptor>src/main/assembly/pkg.xml</descriptor>
              </descriptors>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
</project>

While the Assembly can often be configured entirely from simple POM-level declarations, the best way to use it seriously is
to define a descriptor.

Here it is a descriptor abstract from my CLI archetype:

<!-- 
  File for the Maven Assembly plug-in.

   This produces a binary that contains all the dependencies needed to run this command line tool (in lib/), 
   plus everything that lies on src/main/assembly/resources/ (putting the contents of this folder on the top
   of the final binary file).   
 -->
<assembly>
  <id>pkg</id>

  <formats>
    <format>zip</format>
  </formats>

  <dependencySets>
    <!-- All the .jar dependencies computed by Maven, of course it is based on transitive closure -->
    <dependencySet>
      <!-- Enable only if non-empty <outputFileNameMapping></outputFileNameMapping> -->
      <!-- the final distribution zip will have lib/*.jar in here -->
      <outputDirectory>/lib</outputDirectory>
      <unpack>false</unpack>
      <scope>runtime</scope>
      ...
    </dependencySet>
  </dependencySets> 

  <fileSets>
    <fileSet>
      <!-- Also, includes these files into the .zip. This includes run.sh (which refers links lib/*.jar) 
           and other distro files like log4j.yml -->
      <directory>src/main/assembly/resources</directory>
      <outputDirectory></outputDirectory>
      <filtered>true</filtered>
    </fileSet>

    ...
  </fileSets>

</assembly>

Once you’ve arranged things like above, you can build your CLI project by just typing mvn package. As
usually, Maven will first run the JUnit tests in src/test and then the Assembly invocation will create the
distro .zip into target/. So, try the result:

$ mvn clean package
...
$ cd target
$ unzip cli-ref-project_1.0-SNAPSHOT.zip
Archive:  cli-ref-project_1.0-SNAPSHOT.zip
   creating: cli-ref-project_1.0-SNAPSHOT/
  inflating: cli-ref-project_1.0-SNAPSHOT/log4j2.yaml
  inflating: cli-ref-project_1.0-SNAPSHOT/run.sh
   creating: cli-ref-project_1.0-SNAPSHOT/lib/
  inflating: cli-ref-project_1.0-SNAPSHOT/lib/log4j-api-2.14.0.jar
  inflating: cli-ref-project_1.0-SNAPSHOT/lib/log4j-core-2.14.0.jar
  ...
$ cd cli-ref-project_1.0-SNAPSHOT
$ ./run.sh --help
Usage: runme [-hV] [-n=<name>] [<params>...]
Command Line Example.
      [<params>...]   The command parameters
  -h, --help          Show this help message and exit.
  -n, --name=<name>   Try passing me your name
  -V, --version       Print version information and exit.

$ ./run.sh --name Reader One Two Three
Hello, Reader!
These are the params you sent me:
One Two Three

$

What do you have in that run.sh?

You find all the details in github. As mentioned above, this prepares the JVM invocation for the end user, hiding them from tedious details.
Here it an excerpt:

#!/bin/bash

# These are passed to the JVM. they're appended, so that you can predefine it from the shell
[[ "$JAVA_TOOL_OPTIONS" =~ -Xm[s|x] ]] || JAVA_TOOL_OPTIONS="$JAVA_TOOL_OPTIONS -Xms2G -Xmx4G"

# We always work with universal text encoding.
[[ "$JAVA_TOOL_OPTIONS" =~ -Dfile.encoding ]] || JAVA_TOOL_OPTIONS="$JAVA_TOOL_OPTIONS -Dfile.encoding=UTF-8"

...

export JAVA_TOOL_OPTIONS

# You shouldn't need to change the rest
#
###

cd "$(dirname $0)"
mydir="$(pwd)"

...

export CLASSPATH="$CLASSPATH:$mydir:$mydir/lib/*"

# See here for an explanation about ${1+"$@"} :
# http://stackoverflow.com/questions/743454/space-in-java-command-line-arguments 

java uk.ac.ebi.example.App ${1+"$@"}
ex_code=$?

# We assume stdout is for actual output, that might be pipelined to some other command, the rest (including logging)
# goes to stderr.
# 
echo Java Finished. Quitting the Shell Too. >&2
echo >&2
exit $ex_code

So, as you can see, setup common JVM options, deal with the classpath and take care of bubbling up the OS exit code that the
Java command returned us.

Dont-Repeat-Yourself, use Maven Archetypes

Cool! We managed to create a nice CLI project with Maven. It can be updated and rebuilt how many times you
want, and its build+deployment can also be added to a Continuous Integration pipeline for auto-deployment
(maybe one day I’ll write a post on this too…)

But, what if I need to develop tens of CLI tools like the above in the next months?!

Maybe you’ve already heard about Maven archetypes. The Maven documentation describes what they are
pretty well:

In short, Archetype is a Maven project templating toolkit. An archetype is defined as an original pattern
or model from which all other things of the same kind are made. The name fits as we are trying to provide a
system that provides a consistent means of generating Maven projects. Archetype will help authors create
Maven project templates for users, and provides users with the means to generate parameterised versions of
those project templates.

Sounds like what we need, doesn’t it? We can use the CLI project above as a base to create an archetype.
Every time you need to create a new command line project, you’ll do it starting from the CLI
archetype.

Let’s see the details. This is now the Maven project for the archetype:

$ find cli-archetype
cli-archetype
cli-archetype/maven-settings.xml
cli-archetype/pom.xml
...
cli-archetype/src
cli-archetype/src/main/resources/META-INF/maven/archetype-metadata.xml
cli-archetype/src/main/resources/archetype-resources
cli-archetype/src/main/resources/archetype-resources/pom.xml
cli-archetype/src/main/resources/archetype-resources/src/test/resources/log4j2.yaml
cli-archetype/src/main/resources/archetype-resources/src/test/java/uk/ac/ebi/example/AppTest.java
cli-archetype/src/main/resources/archetype-resources/src/main/assembly/pkg.xml
cli-archetype/src/main/resources/archetype-resources/src/main/assembly/resources/run.sh
cli-archetype/src/main/resources/archetype-resources/src/main/java/uk/ac/ebi/example/App.java

The main POM defines the project as, well, an archetype project (we’ll see that in a moment), and then
this kind of project is expected to have, in addition to a general description in archetype-metadata.xml,
the template files somewhere in src/main/resources/archetype-resources/. Such templates files are
essentially the sample CLI project that we have seen above, with some adaptations to make them parameterised.

This is the archetype’s POM:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>uk.ac.ebi.maven</groupId>
  <artifactId>cli-archetype</artifactId>
  <version>3.0-SNAPSHOT</version>
  ...
</project>  

Yeah, that’s it! It’s like a default project, but it has the right archetype files and thus it can be used
as such.

The POM inside archetype-resources/ is almost the same as the original project we showed above, the
only difference is:

  <groupId>${groupId}</groupId>
  <artifactId>${artifactId}</artifactId>
  <version>${version}</version>

and the named Maven properties will be instantiated with actual value when the archetype is invoked (taking
such values from the invocation’s parameters).

To create a new CLI project, you need to first install the archetype artefact locally, or deploy it to some
artifact repository. Something like:

$ cd cli-archetype
$ mvn deploy
...
$ 

After that, the archetype is ready to be used:

# First, download the archetype (if not already installed locally, this is where I'm deploying mine)
$ mvn dependency:get -Dartifact=uk.ac.ebi.maven:cli-archetype:3.0-SNAPSHOT \
      -DremoteRepositories=https://mbrandizi.jfrog.io/artifactory/maven

# And now run it
$ mvn archetype:generate\
  -DgroupId=info.marcobrandizi.test -DartifactId=testcli -Dversion=1.0-SNAPSHOT\
  -DarchetypeGroupId=uk.ac.ebi.maven -DarchetypeArtifactId=cli-archetype -DarchetypeVersion=3.0-SNAPSHOT 
...
$ 

After that, you should have a new CLI project in the testcli/ directory:

$ cd testcli
$ ll
total 24
-rw-r--r--  1 brandizi  wheel   4.0K  8 Jan 16:09 pom.xml
drwxr-xr-x  4 brandizi  wheel   128B  8 Jan 16:09 src/

# As above!
$ mvn package
$ cd target
$ unzip testcli_1.0-SNAPSHOT.zip
$ cd testcli_1.0-SNAPSHOT
$ ./runme.sh
...
$ 

And now you can start changing the sample project to create an actual command line tool!

Hold on, but do I need to type those mvn commands every time?!

No.

You can automate the archetype invocation by means of a simple create-project.sh script.
You’ll find that on my github project. Because such a script needs the archetype
coordinates (ie, the version), the final version that I’ve let there for you to be used is actually
generated from a template, where the version is put in place by Maven during the archetype build,
thanks to the well-known filtering feature, ie, files can refer to POM and other Maven properties,
the filtering creates copies of those files where the properties have been replaced with the actual values.
If you look at my Maven’s POM for the archetype, you’ll see the tweaks to make that possible in the
<resources> section.

Creating a new project this way is as simple as:

$ ./create-project.sh info.marcobrandizi.test testcli 1.0-SNAPSHOT

or, if you want to use my copy on github, without even having to download it first:

$ create_cli_url="https://raw.githubusercontent.com/marco-brandizi/cli-archetype/master/create-project.sh"    
$ curl -L "$create_cli_url" | sh -s info.marcobrandizi.test testcli 1.0-SNAPSHOT

Inside the script, things are linked to my project and my Maven artefact repository (many thanks to JFrog for
making that available for free). Of course, you can do the same with your own coordinates and for any type
of new archetype (eg, for SpringBoot projects, for your organisation projects that need common dependencies
like JUnit, you name it!)

Could we do it better?

A possible problem with the approach above is that common .jar dependencies (as well as common config
files) are not factorised: multiple CLI distributions will all contain the same files like log4j*.jar,
picocli*.jar, spring-something*.jar etc. and you could end up having many copies of such files in your
system. While nowadays hard disk space is cheap, this can be problematic if, for example, you want to
centralise the configuration for all of your tools.

This could be addressed specifically in your project, eg, your tool might lookup for a config file first
in /etc, then in $HOME, then in the tool’s home.

An alternative could be some end user-level package manager, like Debian APT (and all the other Linux
package managers) or Python’s PIP. For some reason, nothing like that has never taken off in the Java
world, despite past projects like Java Web Start or JPM.

Please, let me know if you know more about that!

Click to rate this post!
[Total: 1 Average: 5]

Written by

I'm the owner of this personal site. I'm a geek, I'm specialised in hacking software to deal with heaps of data, especially life science data. I'm interested in data standards, open culture (open source, open data, open everything), IT and society, and more. Further info at http://marcobrandizi.info/about-me.

Leave a Reply

Your email address will not be published. Required fields are marked *