Remote repositories and reproductability
I cannot do anything but shiver when I read TheServerSide discussion about version control and I read comments such as:
Maven, for example, discourages storage of dependencies in revision control, preferring to grab them from a third party repository
And a comment mentioning:
Since I use Maven 2 : pom.xml + src directory. That’s all I need [...]
and then later in another thread:
I have students doing internship where I work and they complain when they download some source code and it comes with an Ant build. They are more use to Maven (since this is what we use) and prefer it.
We have here obviously conflicting thoughts about the software industry: long term and predictability versus short term and ‘works for me’. The ‘works for me’ attitude has been the rationale behind flawed designs and processes, the original JBoss UCL design for example. Some users just don’t know how it happens to run, when it just does and they are happy with that.. Pragmatic Programmers refer to that as Programming by coincidence so by extension we could call this technique Build by coincidence.
All that is enough to explain why some users actually fail to use any decent project that use Maven that way and end up with comments such as:
#1 I’ve yet to be able to get an external Maven project to build by simply checking it out[...]For some reason – maybe bad luck or maybe because I tend to consult at larger corporations – Maven can never download all the jars. Either it can’t find it or errors out or something.
Reasonably large projects with external dependencies in remote repositories using Maven and who run into problems on a regular basis are notorious: Apache Geronimo, Apache Cocoon, Apache DS. Just browsing the mailing list on a frequent basis is enough to understand how it gets in your way and it just slows down development when you have many dependencies and a remote repository (or worse, severals).
Konstantin Ignatyev is also right on target when he says:
Maven dependencies management is really really bad.[...]Ranges are especially bad: they cause build unpredictability and non repeatability because they make build to depend on server repo content.
A simple example to illustrate how things can be totally wrong:
- Assume MyProject depends on JasperReport 1.2.4
- Which itself depends on commons-collections [2.1,) as can be seen from the POM (meaning 2.1+)
Now if we take look at commons-collections in the Mergere repository we can see that… what is available as 2.1+ ? well among other things 2.1.1, 3.0, 3.1, 3.2…and 20030418.083655, 20031027.000000, 20040102.233541, 20040616.
So what do you think is the most recent (ie: greater) version for Maven ? 3.2 or 2004040616 ?
Relying on uncontrolled remote repositories is evil at best.
Never trust the online repositories for your project, that’s ok for a prototype but not more than that.
The irony being that, some little hands may fix this problem if they read this entry… but thousand of users that actually were depending on these dependencies, will not notice until they clean their cache….and download again the new dependencies and it may maybe break something in their project. So you get non-reproductability.
Put every dependency in source control, download the archives yourself, rewrite the POM yourself (most of the time it is incorrect, but you sure get the list of developpers which is 350 lines long) and be in control. Clean up your cache. Get your machine offline and build. If it does not build right away, you are in trouble anyway, it is just a matter of time before this WMD blows up your product.
Final advice: Use Ivy for dependencies and store all your dependencies under source control.







Lesson being – store your maven repository under version control and don’t use the public maven repositories. Instead build your own internal company one.
I’ve not used Ant in anger since the ability to include (or something like that) was added, it used to be I chose Maven because Ant meant having lots and lots of duplicated build.xml’s between unrelated projects. I’ve been using Ant more recently (building open source projects at work, we use the lowest common delimiter which is Ant) and it’s starting to warm on me again.
Comment by Henri Yandell — July 18, 2006 @ 5:58 am
The piece of advice regarding public repositories is true.
But it does not have to be this way, public repositories can be trusted and should be.
Gentoo, Ubuntoo and many others do rely on public repositories heavily and successfully.
CPAN is very well utilized by Perl folks, Ruby Gems by rubyites.
In Java all the necessary ingredients are available; it is a matter of getting to agree on few very simple principles:
- all the artifacts have to be signed with trusted certificates like it is outlined in http://www.apache.org/dev/release-signing.html If that was the case then it would not matter from where a particular jar has been downloaded;
- no ranges: dependencies has to be specified explicitly. Dependency manager should allow users to override dependency declarations by supplying parameters or in a predefined configuration file;
- dependency declarations should have separate versioning from versioning of artifacts (Ivy implements that actually). I mean that if artifact a-1.0 depends on b-2.1 then there should be dependency declaration a-dd-1.0 that will point to a-1.0 and b-2.1, then when b-2.2 gets released and it is compatible with a-1.0 then dependency declaration a-dd-1.1 needs to be released that will point to a-1.0 and b-2.2.
Comment by Konstantin — August 9, 2006 @ 9:45 pm