Saturday, February 7, 2015

A sales pitch explanation of NixOS

Exactly one week ago, I have visited FOSDEM for the seventh time. In this year's edition, we had a NixOS stand to promote NixOS and its related sub projects. Promoting NixOS is a bit of challenge, because properly explaining its underlying concepts (the Nix package manager) and their benefits is often not that straight forward.


Explaining Nix


Earlier I have written two recipes explaining the Nix package manager, each having its pros and cons. The first recipe is basically explaining Nix from a system administrator's perspective -- it starts by explaining what the disadvantages of conventional approaches are and then what Nix does differently: namely storing packages in isolation in separate directories in the Nix store using hash codes as prefixes. Usually when I show this to people, there is always a justification process involved, because these hash codes look weird and counter-intuitive. Sometimes it still works out despite the confusion, sometimes it does not.

The other recipe explains Nix from a programming language perspective, since Nix borrows its underlying concepts from purely functional programming languages. In this explanation recipe, I first explain in what way purely functional programming languages are different compared to conventional programming languages. Then I draw an analogy to package managers. I find this the better explanation recipe, because the means used to make Nix purely functional (e.g. using hash codes) make sense in this context. The only drawback is that a large subset of the people using package managers are often not programmers and typically do not understand nor appreciate the programming language aspect.

To summarize: advertising the Nix concepts is tough. While I was at the NixOS stand, I had to convince people passing by in just a few minutes that it is worth to give NixOS (or any of its sub projects) a try. In the following section, I will transcribe my "sales pitch explanation" of NixOS.

The pitch


NixOS is a Linux distribution built around the Nix package manager solving package and configuration management problems in its own unique way. When installing systems running Linux distributions by conventional means, it is common to do activities, such as installing the distribution itself, then installing additional custom packages, modifying configuration files and so on, which is often a tedious, time consuming and error prone process.

In NixOS the deployment of an entire system is fully automated. Deployment is driven by a declarative configuration file capturing the desired properties of a system, such as the harddrive partitions, services to run (such as OpenSSH, the Apache webserver), the desktop (e.g. KDE, GNOME or Xfce) and end-user packages (e.g. Emacs and Mozilla Firefox). With just one single command-line instruction, an entire system configuration can be deployed. By adapting the declarative configuration file and running the same command-line instruction again, an existing configuration can be upgraded.

NixOS has a couple of other nice properties as well. Upgrading is always safe, so there is no reason to be worried that an interruption will break a system. Moreover, older system configurations are retained by default, and if an upgrade, for example, makes a system unbootable, you can always switch back to any available older configuration. Also configurations can be reproduced on any system by simply providing the declarative configuration file to someone else.

Several tools in the Nix project extend this deployment approach to other areas: NixOps can be used to deploy a network of NixOS machines in the cloud, Hydra is the Nix-based continuous integration server, Disnix deploys services into a network of a machines. Furthermore, the Nix package manager -- that serves as the basis for all of these tools -- can also be used on any Linux distribution and a few other operating systems as well, such as Mac OS X.

Concluding remarks


The above pitch does not reveal much about its technical aspects, but simply focuses itself on its key aspect -- fully automated deployment and some powerful quality properties. This often leads to more questions from people passing by, but I consider that a good thing.

This year's FOSDEM was a very nice experience. I'd like to thank all the fellow Nixers who did all the organisation work for the stand. As a matter of fact, apart from doing some promotion work at the stand I was not involved in any of its organizational aspects. Besides having a stand to promote our project, Nicolas Pierron gave a talk about NixOS in the distributions devroom. I also enjoyed Larry Wall's talk about Perl 6 very much:


I'm looking forward to see what next year's FOSDEM will bring us!

Thursday, January 29, 2015

Agile software development: my experiences

In a couple of older blog posts, I've reported about my experiences with companies, such as the people to whom I talked to at the LAC conference and my job searching month. One of the things that I have noticed is that nearly all of them were doing "Agile software development", or at least they claim to do so.


At the LAC conference, Agile seemed to be one of the hottest buzzwords and every company had its own success story in which they explained how much Agile methodologies have improved their business and the quality of the systems that they deliver.

The most popular Agile software development methodology nowadays is probably Scrum. All the companies that I visited in my job searching month, claimed that they have implemented it in their organisation. In fact, I haven't seen any company recently, that is intentionally not using Scrum or any other Agile software development methodology.

Although many companies claim to be Agile, I still have the impression that the quality of software systems and the ability to deliver software in time and within the budget haven't improved that much in general, although there are some exceptions, of course.

What is Agile?


I'm not an expert in Agile software development. One of the first things I wanted to discover is what "Agile" actually means. My feeling says that only a few people have an exact idea, especially non-native English speakers, such as people living in my country -- the Netherlands. To me, it looks like most of the developers with a Dutch mother tongue use this buzzword as if it's something as common as ordering a hamburger in a restaurant without consciously thinking about its meaning.

According to the Merriam Webster dictionary, Agile means:
1
: marked by ready ability to move with quick easy grace <an agile dancer>
2
: having a quick resourceful and adaptable character <an agile mind>

The above definition is a bit abstract, but contains a number of interesting keywords. To me it looks like if some person or object is Agile, then it has a combination of the following characteristics: quick, easy, resourceful, and adaptable.

Why do we want/need to be Agile in software development?


It's generally known that many software development projects partially or completely fail, because of many reasons, such as:
  • The resulting system is not delivered in time or cannot be delivered at all.
  • The resulting system does not do what the customer expects, a.k.a. mismatch of expectations. Apart from customers, this also happens internally in a development team -- developers may implement something totally different as a designer has intended.
  • There is a significant lack of quality, such as in performance or security.
  • The overall project costs (way) too much.

These issues are caused by many factors, such as:

  • Wrong estimations. It is difficult (or sometimes impossible) to estimate how much time something will take to implement. For example, I have encountered a few cases in my past career in which something took double or even ten times the amount of time that was originally estimated.
  • Unclarity. Sometimes a developer thinks he has a good understanding of what a particular feature should look like, but after implementing it, it turns out that many requirements were incorrectly interpreted (or sometimes even overlooked), requiring many revisions and extra development time.
  • Interaction problems among team members. For example, one particular developer cannot complete his task because of a dependency on another developer's task which has not been completed yet. Also, there could be a mismatch of expectations among team members. For example, a missing feature that has been overlooked by one developer blocking another developer.
  • Changing requirements/conditions. In highly competitive environments, it may be possible that a competitor implements missing features that a certain class of customers want making it impossible to sell your product. Another example could be Apple changing its submission requirements for the Apple Appstore making it impossible to distribute an application to iPhone/iPad users unless the new requirements have been met.
  • Unpredictable incidents and problems. For example, a team member gets sick and is unavailable for a while. The weather conditions are so bad (e.g. lots of snowfall) that people can't make it to the office. A production server breaks down and needs to be replaced by a new instance forcing the organisation to invest money to buy a new one and time to get it configured.
  • Lack of resources. There is not enough manpower to do the job. Specialized knowledge is missing. A server application requires much more system resources than expected, e.g. more RAM, more diskspace etc.

Ideally, in a software development project, these problems should be prevented. However, since this ideal is hard to achieve, it is also highly desirable to be able to respond to them as quickly as possible without too much effort, to prevent the corresponding problems to grow out of hand. That is why being Agile (e.g. quick, easy, resourceful, and adaptable) in software development is often not only wanted, but also necessary, in my opinion.

Agile manifesto


The "definition of Agile" has been "translated" to software development by a group of practitioners, into something that is known as the Agile manifesto. This manifesto states the following:

We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value:

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

That is, while there is value in the items on the right, we value the items on the left more.

The Agile manifesto looks very interesting, but when I compare it to the definition of Agile provided by the Merriam Webster dictionary, I don't see any of its characterizing keywords (such as adaptable and easy) in the text at all, which looks quite funny to me. The only piece that has some kind of connection is Responding to change (that has a connection to adaptable), but that is pretty much everything I can see that it has in common.

Interpretation


This observation makes me wonder: How is the Agile manifesto going to help us to become more agile in software development and more importantly, how should we interpret it?

Because it states that the items on the left have more value than the items on the right, I have seen many people considering the right items not to be relevant at all. As a consequence, I have seen the following things happen in practice:

  • Not thinking about a process. For example, in one of my past projects, it was common to create Git branches for all kinds of weirdly related tasks in an unstructured manner because that nicely integrated with the issue tracker system. Furthermore, merging was also done at unpredictable moments. As a consequence, it often came together with painful merge conflicts that were hard to resolve making the integration process tedious and much more time consuming than necessary.
  • Not documenting anything at all. This is not about writing down every detail per se, but rather about documenting a system from a high level perspective to make the basics clear to everyone involved in a development process, such as a requirements document.

    I have been involved in quite a few projects in which we just started implementing something without writing anything down at all and "trust" that it eventually gets right. So far, it always took us many more iterations than if most of the simple, basic details would be clear from the beginning. For example, some basic domain knowledge that may sound obvious, may turn out not be that obvious at all.
  • Not having an agreement with the customer. Some of the companies I worked for did integration with third party (e.g. customer's) systems. What, for example, if you are developing the front-end and some error occurs because of a bug in the customer's system? Who's going to get blamed? Typically, it's you unless you can prove otherwise. Moreover, unclear communication may also result in wrong expectations typically extending the development time.
  • Not having a plan at all. Of course being flexible with regard to changes is good, but sometimes you also have to stick yourself to something, because it might completely alter the current project's objectives otherwise. A right balance must be found between these two, or you might end up in a situation like this. It also happened to me a few times when I was developing web sites as a teenager.

From my perspective, the Agile manifesto does not say that the emphasis should lie on the left items only. In fact, I think the right items are also still important. However, in situations where things are unclear or when pressure arises, then the item on the left should take precedence. I'm not sure if this is something the authors of the manifesto have intended to communicate though.

For example, while developing a certain aspect of a system, it would still make sense to me to write their corresponding requirements down so that everyone involved knows about it. However, writing every possible detail down often does not make sense because they are typically not known or subject to change anyway. In these kind of situations, it would be better to proceed working on an implementation, validate that with the stakeholders and refine the requirements later.

Same thing, for example, applies to customer collaboration (in my opinion). An agreement should be made, but of course, there are always unforeseen things that both parties did not know of. In such situations it is good to be flexible, but it should not come at any price.

Why agile?


What is exactly Agile about finding a right balance between these items? I think in ideal situations, having a formalized processed that exactly describes the processes, documentation that catches everything, a solid contract that does not have to be changed and a plan of which you know that works is the quickest and easiest path to get software implemented.

However, since unpredictable and unforeseen things always happen, these might get in your way and you have to be flexible. In such cases, you must be adaptable by giving the items on the left precedence. I don't see, however, what's resourceful about all of this. :-)

So is this manifesto covering enough to consider software development "Agile" if it is done properly? Not everybody agrees! For example, there is also the More Agile Manifesto that covers organisations, not teams. Kent Beck, one of the signatories of the Agile manifesto, wrote an evolved version. Zed Shaw considers it all to be nonsense and simply says that people should do programming and nothing should get in their way.

I'm not really a strong believer in anything. I want to focus myself on facts, rather than on an idealism.

Scrum


As I have explained earlier, nearly all the companies that I visited during my job searching month as well as my current employer have implemented (or claim to have implemented) Scrum in their organisation. According to the Scrum guide, Scrum is actually not a methodology, but rather a process framework.

In a process implementing Scrum, development is iterative and divided into so-called sprints (that take up to 2-4 weeks). At the end of each sprint an increment is delivered that is considered "done". Each sprint has the following activities:

  • The Sprint planning is held at the beginning of each sprint in which the team discusses how and when to implement certain items from the product backlog.
  • Daily scrum is a short meeting held at the beginning of every development day in which team members briefly discuss the progress made the day before, the work that needs to be done the next 24 hours and any potential problems.
  • The Sprint review activity is held at the end of the sprint in which stakeholders review, reflect and demonstrate what is done. Furthermore, future goals are set during this meeting.
  • Finally, the Sprint retrospective meeting is held in which team members discuss what can be improved with regards to people, relationships, process, and tools in future sprints.

In a Scrum process, two kinds of "lists" are used. The product back log contains a list of items that need to be implemented to complete the product. The sprint back log contains a list of items reflecting the work that needs to be done to deliver the increment.

Teams typically consist of 3-9 persons. Scrum only defines three kinds of team member roles, namely the product owner (responsible for maintaining the product back log and validating it), the Scrum master (who guards to process and takes away anything that blocks developers) and developers.

The Scrum guide makes no distinction between specific developer roles, because (ideally) every team member should be able to take over each other's work if needed. Moreover, teams are self-organizing meaning that it's up to the developers themselves (and nobody else) to decide who does what and how things are done.

Why agile?


I have encountered quite a few people saying "Hey, we're doing Scrum in our company, so we're Agile!", because they appear to have some sort of a process reflecting the above listed traits. This makes me wonder: How is Scrum going to help and what is so agile about it?

In my opinion, most of its aspects facilitate transparency (such as the four activities) to prevent that certain things to go wrong or that too much time is wasted because of misunderstandings. It also facilitates reflection with the purpose to adapt and optimize the development process in future sprints.

Since Scrum only loosely defines a process, the activities defined by it (sort of) make sense to me, but also deliberately leaves some things open. As I have mentioned earlier, a completely predictable process would be the quickest and easiest way to do software development, but since that ideal is hard to achieve because of unpredictable/unforeseen events, we need some flexibility too. We must find a balance and that is what Scrum is (sort of doing) by providing a framework that still gives an adopter some degree of freedom.

A few things that came into my mind with regards to a process implementation are:

  • How to specify "items" on the product and sprint backlogs? Although the Scrum guide does not say anything on how to do this, I have seen many people using a so-called "user-story format" in which they describe items in a formalism like "As a <user role> I want to <do some activity / see something etc. >".

    From my point of view, a user story (sort of) reflects a functional or non-functional requirement or a combination of both. However, it is typically only an abstract definition of something that might not cover all relevant details. Moreover, it can also be easily misinterpreted.

    Some people have told me that writing more formal requirements (e.g. by adhering to a standard, such as the IEEE 830-1998 standard for software requirement specifications) is way too formal, too time consuming and "unagile".

    IMHO, I think it really depends on the context. In some projects, the resulting product has only simple requirements (that do not even have to be met fully) and in others more difficult ones. In the latter case, I think it pays off to think about requirements more thoroughly, than having to revise to product many times. Of course, a right balance must be found between specifying and implementing.
  • When is something considered "done"? I believe this is one of the biggest ongoing discussions within the Scrum community, because the Scrum guide intentionally leaves the meaning of this definition open to the implementer.

    Some questions that I sometimes think about are: Should something be demonstrated to stakeholders? Should it also be tested thoroughly (e.g. all automated test cases must pass and the coverage should be acceptable)? Can we simply run a prototype on a development machine or does the increment have to be deployed to a production environment?

    All these questions cannot be uniformly answered. If the sprint goal is a prototype then the meaning of this definition is probably different than a mission critical product. Furthermore, accomplishing all the corresponding tasks to consider something done might be more complicated than expected, e.g. software deployment is often a more difficult problem than people think.
  • How to effectively divide work among team members and how to compose teams? If for example, people have to work on a huge monolithic code base, then it is typically difficult to, for example, compose two teams working on it simultaneously because they might apply conflicting changes that slow things down and may break a system. This could also happen between individual team members. To counter this, modularization of a big codebase helps, but accomplishing this is all but trivial.
  • According to the Scrum guide, each developer is considered equal, but how can we ensure that one developer is capable of taking over another developer's work? That person needs to have the right skills and domain specific knowledge. For the latter aspect it is also important to have something documented, I guess.
  • How to respond to unpredictable events during a sprint? Should it be cancelled? Should the scope be altered?

In practice, I have not seen that many people consciously thinking about the implementation of certain aspects in a Scrum process at all. They are either too much concerned with the measurable aspects of a process, (e.g. is the burndown chart, that reflects the amount of work remaining, looking ok?), or the tools that are being used (e.g. should we add another user story?).

IMHO, Scrum solves and facilitates certain things that helps you to be Agile. But actually being Agile is a much broader and more difficult question to answer. Moreover, this question also needs to be continuously evaluated.

Conclusion


In this blog post, I have written about my experiences with Agile software development. I'm by no means an expert or a believer in any Agile methodology.

In my opinion, what being agile actually means and how to accomplish this is a difficult question to answer and must be continuously evaluated. There is no catch-all solution for being it.

References


I gained most of my inspiration for this blog post from my former colleague's (Rini van Solingen) video log named: "Groeten uit Delft" that covers many Scrum and Agile related topics. I used to work for the same research group (SERG) at Delft University of Technology.

Tuesday, December 30, 2014

Fourth annual blog reflection

Today it's exactly four years ago that I started this blog, so again it's an interesting opportunity to reflect over last year's writings.

Software deployment


As usual, the majority of blog posts written this year were software deployment related. In the mobile application area, I have developed a Nix function allowing someone build Titanium apps for iOS and Android, I revised the Nix iOS build function to use the new simulator facilities of Xcode 6, did some nice tricks to get existing APKs deployed in the Android emulator, and I described an approach allowing someone to do wireless ad-hoc distributions of iOS apps with Hydra, the Nix-based continuous integration server.

A couple of other deployment blog posts were JavaScript related. I have extended NiJS with support for asynchronous package specifications, which can be used both for compilation to Nix expressions or standalone execution by NiJS directly. I advertised the improved version as the NiJS package manager and successor of the Nix package manager on April fools day. I received lots of hilarious comments that day! Some of them included thoughts and comments that I could not possibly think of!

The other JavaScript related deployment blog post was about my reengineering effort of npm2nix that generates Nix expressions from NPM package specifications. The original author/maintainer relinquished his maintainership, and I became a co-maintainer of it.

I also did some other deployment stuff such as investigating how Nix and Hydra builds can be backed up and describing how packages can be managed outside the Nixpkgs tree.

Finally, I have managed to get a more theoretical blog post finished earlier today in which I explore some terminology and mappings between them to improve software deployment processes.

IFF file format experiments


I also spent a bit of time on my fun project involving IFF file formats. I have ported the ILBM viewer and 8SVX player applications from SDL 1.2 to 2.0. I was a bit puzzled by one particular aspect -- namely: how to continuously render 8-bit palettized surfaces, so I have decided to write a blog post about it.

Another interesting thing I did is porting the project to Visual C++ so that they can be run on Windows natively. I wrote a blog post about a porting strategy and improvement to the Nix build function that can be used to build Visual Studio projects.

Research


Although I have left academia there is still something interesting to report about research this year. In the past we have worked on a dynamic build analysis approach to discover license constraints (also covered in Chapter 10 of my PhD thesis). Unfortunately, all the paper submission attempts we did were rejected and eventually we gave up publishing it.

However, earlier in April this year, one of our peers decided to give it another shot and got Shane McIntosh on board. Shane McIntosh and me have put a considerable amount of effort in improving the paper, which we titled: "Tracing software build processes to uncover license compliance inconsistencies". We submitted the improved paper to ASE 2014. Good news is: the paper got accepted! I'm glad to find out that someone can show me that I can be wrong sometimes! :-)

Miscellaneous stuff


I also spent some time on reviving an old dormant project helping me to consistently organise website layouts because I had found some use for it, and to release it as free and open source software on GitHub.

Another blog post I'm very proud of is about structured asynchronous programming in JavaScript. From my experience with Node.js I observed that to make server applications work smoothly, you must "forget" about certain synchronous programming constructs and replace them by asynchronous alternatives. Besides the blog post, I also wrote a library implementing the abstractions.

Blog posts


As with my previous annual reflections, I will also publish the top 10 of my most frequently read blog posts:

  1. On Nix and GNU Guix. As with the previous two annual reflections, this blog post remains on top and will probably stay at that position for a long time.
  2. An alternative explanation of the Nix package manager. Also this blog post's position remains unchanged since the last two reflections.
  3. Composing FHS-compatible chroot environments with Nix (or deploying Steam in NixOS). This blog post has moved to the third position and that's probably because of the many ongoing discussions on the Internet about Nix and the FHS, and the discussion whether NixOS can run Steam.
  4. Using Nix while doing development. This post also gained a bit of more popularity since last year, but I have no idea why.
  5. Setting up a Hydra build cluster for continuous integration and testing (part 1). A blog post about Hydra from and end user perspective that still remains popular.
  6. Setting up a multi-user Nix installation on non-NixOS systems. This blog post is also over one year old and has entered the all time top 10. This clearly indicates that the instructions in the Nix manual are still unclear and this feature is wanted.
  7. Asynchronous programming with JavaScript. Another older blog post that got some exposure on some discussion sites and entered the all time top 10 as a consequence.
  8. Second computer. Still shows that the good ol' Amiga remains popular! This blog post has been in the all-time top 10 since the first annual blog reflection.
  9. Yet another blog post about Object Oriented Programming and JavaScript. Yet another older blog post that was suddenly referenced by a Stackoverflow article. As a consequence, it entered the all time top 10.
  10. Wireless ad-hoc distributions of iOS applications with Hydra. This is the only blog article I wrote this year that ended up in the all-time top 10. Why it is so popular is a mistery to me. :-)

Conclusion


I'm still not out of ideas and there will be more stuff to report about next year, so stay tuned! The remaining thing I'd like to say is:

HAPPY NEW YEAR!!!

On the improvement of software deployment processes and some definitions

Some time ago, I wrote a blog post about techniques and lessons to improve software deployment processes. The take-home message of this blog post was that in order to improve deployment processes, you must automate everything from the very beginning in a software development process and properly decompose the process into sub units to make the process more manageable and efficient.

In this blog post, I'd like to dive a bit deeper into the latter aspect by exploring some definitions of "decomposition units" in the literature and by deriving mappings between them.

Software projects


The first "definition" that I want to mention is the software project, for which I (interestingly enough) could not find anything in the literature. The reason why I start with this term is that software deployment issues often already appear in the early stages of a software development process.

The term "software project" is something which is hard to define formally IMHO. To me they typically manifest themselves as directories of files that I can divide into the following categories:

  • Executable code. Files typically containing code implementing a program that performs computation and manipulates data.
  • Resources/data. Files not implementing anything that is executed, which are used or referenced by the program, such as images, configuration files, video, audio, HTML pages, etc.
  • Build configuration files. Configuration files used by a build system that transform or change the files belonging to the earlier two categories.

    For example, executable code is often implemented in higher level programming languages and must be compiled to object code so that the program can be executed. Also many kinds of other processing steps can be executed, such as scaling images to lower resolutions, obfuscating/minifying code, running a style checker, bundling object code and resources etc.

Sometimes it is hard to draw a hard line between executable code and data files. For example, it may be possible that a data artifact (e.g. an HTML page) includes executable code (e.g. embedded JavaScript), and the other way around, such as assembly code containing strings in their code sections for efficiency.

Software projects can often be conveniently created by an Integrated Development Environment (IDE) that typically provides useful templates and automatically fills in many boilerplate settings. However, for small projects, people frequently create software projects manually, for example, by manually creating a directory of source files with a Makefile.

It is probably obvious to notice that dealing with software deployment complexity requires automation and files belonging to the third category (build configuration files) must be provided. Yet, I have seen quite a few projects in the past in which nothing is automated and people still rely on manually executing executing build tasks in an IDE, which is often tedious, time consuming and error prone.

Software modules


An automated build process of a software project provides a basic and typically faster means of (re)producing releases of a software product and is often less error prone than a manual build process.

However, besides build process automation there could still be many other issues. For example, if a software project has a monolithic build structure in which nothing can be built separately, deployment times become unnecessarily long and their configurations often have a huge maintenance complexity. Also, upgrading an existing deployment is typically difficult, expensive and unreliable.

To improve the efficiency of build processes, we need to decompose them into units that can be built separately. An import prerequisite to accomplish build decomposition is functional separation of important aspects of a software project.

A relatively simple concept supporting functional separation is the software module. According to Clemens Szyperski's "Component Software" book, a software module is a unit that has the following characteristics:

  • A module implements an ADT (Abstract Data Type).
  • Encapsulates multiple entities, often classes, but sometimes other kinds of entities, such as functions.
  • Have no concept of instantiation, in other words: there is one and only one instance of a module.

Several programming languages have a notion of modules, such as Module-2, Ada, C# and Java (since version 9). Sometimes the module concept is named differently in these languages. For example, in Ada modules are called packages and in C# they are called assemblies.

Not all programming languages support modular programming. Sometimes external facilities must be used, such as CommonJS in JavaScript. Moreover, modules can also be "simulated" in various ways, such as with static classes or singleton objects.

Encapsulating functionality into modules also typically imposes a certain filesystem structure for organizing the source code files. In some contexts, a module must correspond to a single file (e.g. in CommonJS) and in others to directories of files following a certain convention (e.g. in Java the names of directories should correspond to the package names, and the names of regular files to the name of the enclosing type in the code). Sometimes files belonging to a module can also be bundled into a single archive, such as a Zip container (e.g. a JAR file) or library file (e.g. *.dll or *.so files).

Refactoring a monolithic codebase into modules in a meaningful way is all but trivial. According to the paper "On the criteria to be used in decomposing systems into modules" written by David Parnas, it is a good practice to minimize coupling between modules (i.e. the dependencies between modules should be minimized) and maximize cohesion within modules (i.e. strongly related things should belong to the same module).

Software components


The biggest benefit of modularization is that parts of the code can be effectively reused. Reuse of software assets can be improved even further by turning modules (that typically work on code level) into software components that work on system level. Clemens Szyperski's "Component Software" book says the following about them:
The characteristic properties of a component are that it:

  • is a unit of independent deployment
  • is a unit of third-party composition
  • has no (externally) observable state

The above characteristics have several implications:

  • Independent deployment means that a component is well separated from the environment and other components, never deployed partially and third parties should not require access to its construction details.
  • To allow third-party composition a component must be sufficiently self contained and have clear specifications of what it provides and what it requires. In other words, they interact with the environment with well defined interfaces.
  • No externally observable state means that no distinction can be made between multiple copies of components.

So in what way are components different than modules? From my point of view, modularization is a prerequisite for componentization and some modules may already qualify themselves as minimal components.

However, some notable differences between modules and components is that the former are allowed to have observable state (e.g. having global variables that are imperatively modified) and dependencies on implementations rather than interfaces.

Furthermore, to implement software components standardized component models are frequently used, such as CORBA, COM, EJB, or web services (e.g. SOAP, WSDL, UDDI) that provide various kinds of facilities, such as (some sort of) a platform independent interface, lookup and discovery. Modules typically use the interface facilities provided by a programming language.

Build-Level Components


Does functional separation of a monolithic codebase into modules and/or components also improve deployment? According to Merijn de Jonge's IEEE TSE paper titled: "Build-Level components" this is not necessarily true.

For example, it may still be possible that source code files implementing modules or components on a functional level, are scattered across directories of source code files. For example, between the directories in a codebase, many references may exist (strong coupling) and directories often contain too many files (weak cohesion).

According to the paper, strong coupling and weak cohesion on the build level have the following disadvantages:
  1. potentially reusable code, contained in some of the entangled modules, cannot easily be made available for reuse;
  2. the fixed nature of directory hierarchies makes it hard to add or to remove functionality;
  3. the build system will easily break when the directory structure changes, or when files are removed or renamed.

In the paper, the author shows that Component-Based Software Engineering (CBSE) principles can be applied to the build level as well. Build-Level components can be formed by directories of source files and serve as a unit of composition. Access occurs via build, configuration, and requires interfaces:

  • The build interface defines which build operations to execute. In a GNU Autotools project following the GNU Coding Standards (used in the paper), these operations correspond to a number standardized make targets, e.g. make all, make install, make dist.
  • The configuration interface defines which variability points and parameters can be enabled or disabled. In a GNU Autotools project, this interface correspond to the --enable-foo and --disable-foo parameters passed to the configure script -- each enable or disable parameter defines a certain feature that can be enabled or disabled.
  • The requires interface can be used to bind dependencies to components. In a GNU Autotools project, this interface correspond to the --with-foo and --without-foo parameters passed to the configure script that take the paths to the corresponding dependencies as parameters allowing the configuration script to find it.

Although the paper only uses GNU Autotools-based for implementation purposes, build-level components are not restricted to any build technology -- the only thing that matters is that the operations for these three interfaces are standardized so that any component can be configured, composed, and built uniformly.

The paper describes a collection of smells and some refactor patterns that need to be applied to turn directories of source files into build level components. The rules mentioned in the paper are the following:
  1. Components with directory granularity
  2. Circular dependencies should be prevented
  3. Software building via standardized build interface
  4. Compile-time variability binding via standardized configuration interface
  5. Late binding of dependencies via require interface
  6. Build process definition per component
  7. Configuration process definition per component
  8. Component deployment with build level-packages
  9. Automated component composition

Software packages


As described in the previous sections, functional separation is a prerequisite to compose build level components. One important aspect of build-level components is that build processes of modules and components are separated. But how does build separation affect the overall deployment process (to which the build phase also belongs)?

Many deployment processes are typically carried out by tools called package managers. Package managers install units that are called software packages. According to the paper: "Package Upgrades in FOSS Distributions: Details and Challenges" written by Di Cosmo et al (HotSWUp 2008), a software package can be defined as follows:
Packages are abstractions defining the granularity at which users can act (add, remove, upgrade, etc.) on available software.

According to the paper a package is typically a bundle of 3 parts:

  • Set of files. Contains all kinds of files that must be copied somewhere to the host system to make the software work, such as scripts, binaries, resources etc.
  • Set of valued meta-information. Contains various kinds of meta attributes, such as the name of the package, the version, a description and its license. Most importantly, it contains information about the inter-package relationships which includes a set of dependencies on other packages and a set of conflicts with other packages. Package managers typically install its required dependencies automatically and refuses to install if a conflict has been encountered.
  • Executable configuration scripts (also known as maintainer scripts). These are basically scripts that imperatively "glue" files from the package to files already residing on the system. For example, after a certain package has been installed, some configuration files of the host system are automatically adapted so that it can be used properly.

Getting a software project packaged typically involves defining the meta data (including the dependencies/conflicts on external packages), bundling the build process (for source package managers) or the resulting build artifacts (for binary package managers), and composing maintainer scripts taking care of the remaining bits to make the package work (although I would personally not recommend using these kinds of scripts).

This process already works for big monolithic software projects. However, it has several drawbacks for these kinds of projects. Since it needs to deploy a big project as a whole, deployment is typically an expensive process. Not only a fresh installation of a package takes time, but also upgrading, since it has to replace an existing installation as a whole instead of the affected areas only.

Moreover, upgrading is also quite dangerous. Many package managers typically replace and remove files belonging to a package that reside in global locations on the filesystem, such as /usr/bin, /usr/lib (on Linux) or C:\WINDOWS\SYSTEM32 (on Windows). If an upgrade process gets interrupted, the system might reach an inconsistent state for which it might be difficult (or impossible) to do a rollback. The bigger a project is the more severe the potential damage becomes.

Packaging smaller units of a software project (e.g. a build-level component) is typically more work, but also has great benefits. It allows certain, smaller pieces of a software projects to be replaced separately, significantly increasing the efficiency and reliability of the upgrades. Moreover, the dependencies of software components and build-level components have already been identified and only need to be translated to the corresponding packages that provide them.

Nix packages


I typically use the Nix package manager (and related tools) for deployment activities. It borrows concepts from purely functional programming languages to make deployment reliable, reproducible and efficient.

In what way do packages deployed by Nix conform to the definition of software package shown earlier?

Deployment in Nix is driven by build recipes (called Nix expressions) that build packages including all its dependencies from source. Every package build (indirectly) invokes the derivation {} function that composes an isolated environment in which builds are executed in such a way that only the declared dependencies can be found and anything else cannot influence the build. The function arguments include package metadata, such as a description, license, maintainer etc. and the package dependencies.

References to dependencies in Nix are exact meaning that they bind to specific builds of other Nix packages. Conventional package managers, software components and build-level components typically use nominal version specifications consisting of the names and version numbers of the packages, which are less strict. Mapping nominal dependencies to exact dependencies is not always trivial. For example, nominal version ranges are unsupported in Nix and must be snapshotted. In an earlier blog post that describes how to deploy NPM packages with Nix has more details about this.

Another notable trait of Nix is that is has no notion of conflicts. In Nix, any package can coexist with another because they are all stored in isolated directories. However, conflicts may also indicate runtime conflicts between two packages. These kinds of issues need to be solved by other means.

Finally, Nix packages have no configuration (or maintainer) scripts, because they imperatively modify the system's state which conflicts with its underlying purely functional deployment model. Many things that configuration scripts typically do are accomplished in a different way if Nix is used for deployment. For example, configuration files are not adapted, but generated in a Nix expression and deployed as a Nix package. Service activation is typically done by generating a job description file (e.g. init script or systemd job) that starts and stops it.

NixOS configurations, Disnix configurations, Hydra jobsets


If something is packaged in a Nix expression you could easily broaden the application area of deployment:

  • With a few small modifications (mainly encapsulating several packages into a jobset), a Nix package can be turned into a Hydra jobset, so that a project can be integrated and tested continuously.
  • A package can be referenced from a NixOS module that, for example, automatically starts and stops a package on startup and shutdown. NixOS can be used to deploy entire system configurations from a single declarative specification in which the module is enabled.
  • A collection of NixOS configurations can also be deployed in a network of physical or virtual machines through NixOps.
  • A package can be turned into service by adding a specification of inter-dependencies (services that may reside on other machines in a network). These services can be used to compose a Disnix configuration that deploys services to machines in a network.

Summary


I can summarize all the terms described in this blog post and the activities that need to be performed to implement them in the following chart:


Concluding remarks


In this blog post, I have described some terminology and potential mappings between them with the purpose of defining a reengineering process that makes deployment processes more manageable and efficient.

The terms and mappings used in this blog post are quite abstract. However, if we make a number of concrete technology choices, e.g. a programming language (Java), component technology (web services), package manager (Nix), we can define a more concrete process allowing someone to make considerable improvements.

Moreover, the terms described in this blog post are idealistic. In practice, most units that are called modules or components do not fully qualify themselves as such, while it is still possible to package and deploy them individually. Perhaps, it would also be useful to make "weaker" definitions of some of the terms described in this blog post and to look for their corresponding minimum requirements.

Finally, we can also look into more refactor/reengineering patterns for the other terms and possible automation of them.

Thursday, October 30, 2014

Deploying iOS applications with the Nix package manager revisited

Previously, I have written a couple of blog posts about iOS application deployment. For example, I have developed a Nix function that can be used to build apps for the iOS simulator and real iOS devices, made some testability improvements, and implemented a dirty trick to make wireless ad-hoc distributions of iOS apps possible with Hydra, the Nix-based continuous integration server.

Recently, I made a some major changes to the Nix build function which I will describe in this blog post.

Supporting multiple Xcode versions


Xcode version 6.0 and beyond do not support iOS SDK versions below 8.0. Sometimes, it might still be desirable to build apps against older SDKs, such as 7.0. To be able to do that, we must also install older Xcode versions alongside newer versions.

As with recent Xcode versions, we must also install older Xcode versions manually first and use a Nix proxy function to use it. DMG files for older Xcode versions can be obtained from Apple's developer portal.

When installing a second Xcode DMG, you typically get a warning that looks as follows:


The installer attempts to put Xcode in its standard location (/Applications/Xcode.app), but if you click on 'Keep Both' then it is installed in a different path, such as /Applications/Xcode 2.app.

I modified the proxy function (described in the first blog post) in such a way that the version number and path to Xcode are configurable:

{ stdenv
, version ? "6.0.1"
, xcodeBaseDir ? "/Applications/Xcode.app"
}:

stdenv.mkDerivation {
  name = "xcode-wrapper-"+version;
  buildCommand = ''
    mkdir -p $out/bin
    cd $out/bin
    ln -s /usr/bin/xcode-select
    ln -s /usr/bin/security
    ln -s /usr/bin/codesign
    ln -s "${xcodeBaseDir}/Contents/Developer/usr/bin/xcodebuild"
    ln -s "${xcodeBaseDir}/Contents/Developer/usr/bin/xcrun"
    ln -s "${xcodeBaseDir}/Contents/Developer/Applications/iOS Simulator.app/\
Contents/MacOS/iOS Simulator"

    cd ..
    ln -s "${xcodeBaseDir}/Contents/Developer/Platforms/\
iPhoneSimulator.platform/Developer/SDKs"

    # Check if we have the xcodebuild version that we want
    if [ -z "$($out/bin/xcodebuild -version | grep -x 'Xcode ${version}')" ]
    then
        echo "We require xcodebuild version: ${version}"
        exit 1
    fi
  '';
}

As can be seen in the expression, two parameters have been added to the function definition. Moreover, only tools that a particular installation of Xcode does not provide are referenced from /usr/bin. The rest of the executables are linked to the specified Xcode installation.

We can configure an alternative Xcode version by modifying the composition expression shown in the first blog post:

rec {
  stdenv = ...;

  xcodeenv = import ./xcode-wrapper.nix {
    version = "5.0.2";
    xcodeBaseDir = "/Applications/Xcode 2.app";
    inherit stdenv;
  };

  helloworld = import ./pkgs/helloworld {
    inherit xcodeenv;
  };
  
  ...
}

As may be observed, we pass a different Xcode version number and path as parameters to the Xcode wrapper which correspond to an alternative Xcode 5.0.2 installation.

The app can be built with Nix as follows:

$ nix-build default.nix -A helloworld
/nix/store/0nlz31xb1q219qrlmimxssqyallvqdyx-HelloWorld
$ cd result
$ ls
HelloWorld.app  HelloWorld.app.dSYM

Simulating iOS apps


Previously, I also developed a Nix function that generates build scripts that automatically spawn iOS simulator instances in which apps are deployed, which is quite useful for testing purposes.

Unfortunately, things have changed considerably in the new Xcode 6 and the old method no longer works.

I created a new kind of script that is based on details described in the following Stack overflow article: http://stackoverflow.com/questions/26031601/xcode-6-launch-simulator-from-command-line.

First, simulator instances must be created through Xcode. This can be done by starting Xcode and opening Window -> Devices in the Xcode menu:


A new simulator instance can be added by clicking on the '+' button on the bottom left in the window:


In the above example, I create a new instance with a name 'iPhone 6' that simulates an iPhone 6 running iOS 8.0.

After creating the instance, it should appear in the device list:


Furthermore, each simulator instance has a unique device identifier (UDID). In this particular example, the UDID is: 0AD5FC1C-A360-4D05-9D6A-FD719C46A149

We can launch the simulator instance we just created from the command-line as follows:

$ open -a "$(readlink "${xcodewrapper}/bin/iOS Simulator")" --args \
    -CurrentDeviceUDID 0AD5FC1C-A360-4D05-9D6A-FD719C46A149

We can provide the UDID of the simulator instance as a parameter to automatically launch it. If we don't know the UDID of a simulator instance, we can obtain a list from the command line by running:

$ xcrun simctl list
== Device Types ==
iPhone 4s (com.apple.CoreSimulator.SimDeviceType.iPhone-4s)
iPhone 5 (com.apple.CoreSimulator.SimDeviceType.iPhone-5)
iPhone 5s (com.apple.CoreSimulator.SimDeviceType.iPhone-5s)
iPhone 6 Plus (com.apple.CoreSimulator.SimDeviceType.iPhone-6-Plus)
iPhone 6 (com.apple.CoreSimulator.SimDeviceType.iPhone-6)
iPad 2 (com.apple.CoreSimulator.SimDeviceType.iPad-2)
iPad Retina (com.apple.CoreSimulator.SimDeviceType.iPad-Retina)
iPad Air (com.apple.CoreSimulator.SimDeviceType.iPad-Air)
Resizable iPhone (com.apple.CoreSimulator.SimDeviceType.Resizable-iPhone)
Resizable iPad (com.apple.CoreSimulator.SimDeviceType.Resizable-iPad)
== Runtimes ==
iOS 7.0 (7.0.3 - 11B507) (com.apple.CoreSimulator.SimRuntime.iOS-7-0)
iOS 7.1 (7.1 - 11D167) (com.apple.CoreSimulator.SimRuntime.iOS-7-1)
iOS 8.0 (8.0 - 12A365) (com.apple.CoreSimulator.SimRuntime.iOS-8-0)
== Devices ==
-- iOS 7.0 --
-- iOS 7.1 --
-- iOS 8.0 --
    iPhone 4s (868D3066-A7A2-4FD1-AF6A-25A90F480A30) (Shutdown)
    iPhone 5 (7C672CBE-5A08-481A-A5EF-2EA834E3FCD4) (Shutdown)
    iPhone 6 (0AD5FC1C-A360-4D05-9D6A-FD719C46A149) (Shutdown)
    Resizable iPhone (E95FC563-8748-4547-BD2C-B6333401B381) (Shutdown)

We can also install an app into the simulator instance from the command-line. However, to be able to install any app produced by Nix, we must first copy the app to a temp directory and restore write permissions:

$ appTmpDir=$(mktemp -d -t appTmpDir)
$ cp -r "$(echo ${app}/*.app)" $appTmpDir
$ chmod -R 755 "$(echo $appTmpDir/*.app)"

The reason why we need to do this is because Nix makes a package immutable after it has been built by removing the write permission bits. After restoring the permissions, we can install it in the simulator by running:

$ xcrun simctl install 0AD5FC1C-A360-4D05-9D6A-FD719C46A149 \
    "$(echo $appTmpDir/*.app)"

And launch the app in the simulator with the following command:

$ xcrun simctl launch 0AD5FC1C-A360-4D05-9D6A-FD719C46A149 \
    MyCompany.HelloWorld

Like the old simulator function, I have encapsulated the earlier described steps in a Nix function that generates a script spawning the simulator instance automatically. The example app can be deployed by writing the following expression:

{xcodeenv, helloworld}:

xcodeenv.simulateApp {
  name = "HelloWorld";
  bundleId = "MyCompany.HelloWorld";
  app = helloworld;
}

By running the following command-line instructions, we can automatically deploy an app in a simulator instance:

$ nix-build -A simulate_helloworld
/nix/store/jldajknmycjwvf3s6n71x9ikzwnvgjqs-simulate-HelloWorld
./result/bin/run-test-simulator 0AD5FC1C-A360-4D05-9D6A-FD719C46A149

And this is what the result looks like:


The UDID parameter passed to the script is not required. If a UDID has been provided, it deploys the app to that particular simulator instance. If the UDID parameter is omitted, it displays a list of simulator instances and asks the user to select one.

Conclusion


In this blog post, I have described an addition to the Nix function that builds iOS application to support multiple versions of Xcode. Furthermore, I have implemented a new simulator spawning script that works with Xcode 6.

The example case can be obtained from my GitHub page.

Wednesday, October 8, 2014

Deploying NPM packages with the Nix package manager

I have encountered several people saying that the Nix package manager is a nice tool, but they do not want to depend on it to build software. Instead, they say that they want to keep using the build tools they are familiar with.

To clear up some confusion: Nix's purpose is not to replace any build tools, but complementing them by composing isolated environments in which these build tools are executed.

Isolated environments


Isolated environments composed by Nix have the following traits:

  • All environment variables are initially cleared or set to dummy values.
  • Environment variables are modified in such a way that only the declared dependencies can be found, e.g. by adding the full path of these packages (residing in separate directories) to PATH, PERL5LIB, CLASSPATH etc.
  • Processes can only write to a designated temp folder and output folders in the Nix store. Write access to any other folder is restricted.
  • After the build has finished, the output files in the Nix store are made read-only and their timestamps are reset to 1 UNIX-time.
  • The environment can optionally be composed in a chroot environment in which no undeclared dependencies and non-package related arbitrary files on the filesystem can be accidentally accessed, no network activity is possible and other processes cannot interfere.

In these environments, you can execute many kinds of build tools, such as GNU Autotools, GNU Make, CMake, Apache Ant, SCons, Perl's MakeMaker and Python's setuptools, typically with little problems. In Nixpkgs, a collection of more than 2500 mostly free and open-source packages, we run many kinds of build tools inside isolated environments composed by Nix.

Moreover, besides running build tools, we can also do other stuff in isolated environments, such as running unit tests, or spawning virtual machine instances in which system integration tests are performed.

So what are the benefits of using such an approach as opposed to running build tools directly in an ad-hoc way? The main benefit is that package deployment (and even entire system configurations and networks of services and machines) become much more reliable and reproducible. Moreover, we can also run multiple builds safely in parallel improving the efficiency of deployment processes.

The only requirements that must be met in a software project are some simple rules so that builds do not fail because of the restrictions that these isolated environments impose. A while ago, I have written a blog post on techniques and lessons to improve software deployment that gives some more details on this. Moreover, if you follow these rules you should still be able to build your software project with your favourite build tools outside Nix.

(As a sidenote: Nix can actually also be a used as a build tool, but this application area is still experimental and not frequently used. More info on this can be found in Chapter 10 of Eelco Dolstra's PhD thesis that can be obtained from his publications page).

Dependency management


The fact that many build tools can be complimented by Nix probably sounds good, but there is one particular class of build tools that are problematic to use with Nix -- namely build tools that also do dependency management in addition to build management. For these kinds of tools, the Nix package manager conflicts, because the build tool typically insists taking over Nix's responsibilities as a dependency manager.

Moreover, Nix's facilities typically restrict such tools to consult external resources, but if we would allow them to do their own dependency management tasks (which is actually possible by hacking around Nix's deployment model), then the corresponding hash codes inside the Nix store paths (which are derived from all buildtime dependencies) are no longer guaranteed to accurately represent the same build results limiting reliable and reproducible deployment. The fact that other dependency managers use weaker nominal version specifications mainly contributes to that.

Second, regardless of what package manager is used, you can no longer rely on the package management system's dependency manager to deploy a system, but you also depend on extra tools and additional distribution channels, which is generally considered tedious by software distribution packagers and end-users.

NPM package manager


A prominent example of a tool doing both build and dependency management is the Node.js Package Manager (NPM), which is the primary means within the Node.js community to build and distribute software packages. It can be used for a variety of Node.js related deployment tasks.

The most common deployment task is probably installing the NPM package dependencies of a development project. What developers typically do is entering the project's working directory and running:

$ npm install

To install all its dependencies (which are obtained from the NPM registry, external URLs and Git repositories) in a special purpose folder named node_modules/ in the project workspace so that it can be run.

You can also globally install NPM packages from the NPM registry (such as command-line utilities), by running:

$ npm install -g nijs

The above command installs a NPM package named NiJS globally including all its dependencies. After the installation has been completed you should be able to run the following instruction on the command-line:

$ nijs-build --help

NPM related deployment tasks are driven by a specification called package.json that is included in every NPM package or the root folder of a development project. For example, NiJS' package.json file looks as follows:

{
  "name" : "nijs",
  "version" : "0.0.18",
  "description" : "An internal DSL for the Nix package manager in JavaScript",
  "repository" : {
    "type" : "git",
    "url" : "https://github.com/svanderburg/nijs.git"
  },
  "author" : "Sander van der Burg",
  "license" : "MIT",
  "bin" : {
    "nijs-build" : "./bin/nijs-build.js",
    "nijs-execute" : "./bin/nijs-execute.js"
  },
  "main" : "./lib/nijs",
  "dependencies" : {
    "optparse" : ">= 1.0.3",
    "slasp": "0.0.4"
  }
}

The above package.json file defines a package configuration object having the following properties:

  • The name and version attributes define the name of the package and its corresponding version number. These two attributes are mandatory and if they are undefined, NPM deployment fails. Moreover, version numbers are required to follow the semver standard. One of semver's requirements is that the version attribute should consist of three version components.
  • The description, repository, author and license attributes are simply just meta information. They are not used during the execution of deployment steps.
  • The bin attribute defines which executable files it should deploy and to which CommonJS modules in the package they map.
  • The main attribute refers to the module that is primary entry point to the package if it is included through require().
  • The dependencies parameter specifies the dependencies that this package has on other NPM packages. This package depends on a library called optparse that must be of version 1.0.3 or higher and a library named slasp which must be exactly of version 0.0.4. More information on how NPM handles dependencies is explained in the next section.

Since the above package is a pure JavaScript package (which most NPM packages are) no build steps are needed. However, if some package do need to perform build steps, e.g. compiling CoffeeScript to JavaScript, or building bindings to native code, then a collection of scripts can be specified, which are run at various times in the lifecycle of a package, e.g. before and after the installation steps. These scripts can (for example) execute the CoffeeScript compiler, or invoke Gyp that compiles bindings to native code.

Replacing NPM's dependency management


So how can we deploy NPM packages in an isolated environment composed by Nix? In other words: how can we "complement" NPM with Nix?

To accomplish this view, we must substitute NPM's dependency manager, that conflicts with the Nix package manager, by something that does the dependency management the "Nix way" while retaining the NPM semantics and keeping its build facilities.

Luckily, we can easily do that by just running NPM inside a Nix expression and "fooling" it not to install any dependencies itself, by providing a copies of these dependencies in the right locations ourselves.

For example, to make deployment of NiJS work, we can just simply extract the tarball's contents, copy the result into the Nix store, entering the output folder, and copying its dependencies into the node_modules directory ourselves:
mkdir -p node_modules
cp -r ${optparse} node_modules
cp -r ${slasp} node_modules
(The above antiquoted expressions, such as ${optparse} refer to the result of Nix expressions that build the corresponding dependencies).

Finally, we should be able to run NPM inside a Nix expression as follows:

$ npm --registry http://www.example.com --nodedir=${nodeSources} install

When running the above command-line instruction after the copy commands, NPM notifies that all the required dependencies of NiJS are already present and simply proceeds without doing anything.

We also provide a couple of additional parameters to npm install:

  • The --registry parameter prevents that, if any dependency is appears to be missing, the NPM registry is consulted, which is undesirable. We want deployment of NPM package dependencies to be Nix's responsibility and making it fail when dependency specifications are incomplete is exactly what we need to be sure that we correctly specify all required dependencies.
  • The --nodedir parameter specifies where the Node.js source code can be found, which is used to build NPM packages that have bindings to native code. nodeSources is a directory containing the unpacked Node.js source code:

    nodeSources = runCommand "node-sources" {} ''
      tar --no-same-owner --no-same-permissions -xf ${nodejs.src}
      mv node-* $out
    '';
    

  • When running NPM in a source code directory (as shown earlier), all development dependencies are installed as well, which is often not required. By providing the --production parameter, we can deploy the package in production mode, skipping the development dependencies.

    Unfortunately, there is one small problem that could occur with some packages defining a prepublish script -- NPM tries to execute this script while a development dependency might be missing causing the deployment to fail. To remedy this problem, I also provide the --ignore-scripts parameter to npm install and I only run the install scripts afterwards, through:

    $ npm run install --registry http://www.example.com --nodedir=${nodeSources}
    

Translating NPM's dependencies


The main challenge of deploying NPM packages with Nix is implementing a Nix equivalent for NPM's dependency manager.

Dependency classes


Currently, an NPM package configuration could declare the following kinds of dependencies which we somehow have to fit in Nix's deployment model:

  • The dependencies attribute specifies which dependencies must be installed along with the package to run it. As we have seen earlier, simply copying the package of the right version into the node_modules folder in the Nix expression suffices.
  • The devDependencies attribute specifies additional dependencies that are installed in development mode. For example, when running: npm install inside the folder of a development project, the development dependencies are installed as well. Also, simply copying them suffices to allow deployment in a Nix expression to work.
  • The peerDependencies attribute might suggest another class of dependencies that are installed along with the package, because of the following sentence in the package.json specification:

    The host package is automatically installed if needed.

    After experimenting with a basic package configuration containing only one peer dependency, I discovered that peer dependencies are basically used as a checking mechanism to see whether no incompatible versions are accidentally installed. In a Nix expression, we don't have to do any additional work to support this and we can leave the check up to NPM that we run inside the Nix expression.

    UPDATE: It looks like the new NPM bundled with Node.js 0.12.0 does seem to actually install peer dependencies.

  • bundledDependencies affects the publishing process of the package to the NPM registry. The bundled dependencies refer to a subset of the declared dependencies that are statically bundled along with the package when it's published to the NPM registry.

    When downloading and unpacking a package from the NPM registry that has bundled dependencies, then a node_modules folder exist that contains these dependencies including all their dependencies.

    To support bundled dependencies in Nix, we must first check whether a dependency already exists in the node_modules folder. If this is the case, we should leave as it is, instead of providing the dependency ourselves.
  • optionalDependencies are also installed along with a package, but do not cause the deployment to fail if any error occurs. In Nix, optional dependencies can be supported by using the same copying trick as regular dependencies. However, accepting failures (especially non-deterministic ones), is not something the Nix deployment model supports. Therefore, I did not derive any equivalent for it.

Version specifications


There are various ways to refer to a specific version of a dependency. Currently, NPM supports the following kinds of version specifications:

  • Exact version numbers (that comply with the semver standard), e.g. 1.0.1
  • Version ranges complying with the semver standard, e.g. >= 1.0.3, 5.0.0 - 7.2.3
  • Wildcards complying with the semver standard, e.g. any version: * or any 1.0 version: 1.0.x
  • The latest keyword referring to the latest stable version and unstable keyword referring to the latest unstable version.
  • HTTP/HTTPS URLs referring to a TGZ file being an NPM package, e.g. http://localhost/nijs-0.0.18.tgz.
  • Git URLs referring to a Git repositories containing a NPM package, e.g. https://github.com/svanderburg/nijs.git.
  • GitHub identifiers, referring to an NPM package hosted at GitHub, e.g. svanderburg/nijs
  • Local paths, e.g. /home/sander/nijs

As described earlier, we can't leave fetching the dependencies up to NPM, but Nix has to do this instead. For most version specifications (the only exception being local paths) we can't simply write a function that takes a version specifier as input and fetches it:

  • Packages with exact version numbers and version ranges are fetched from the NPM registry. In Nix, we have to translate these into fetchurl {} invocations, which requires an URL and an output hash value as as parameters allowing us to check the result to make builds reliable and reproducible.

    Luckily, we can retrieve the URL to the NPM package's TGZ file and its corresponding SHA1 hash by fetching the package's metadata from the NPM registry, by running:
    $ npm info nijs@0.0.18
    { name: 'nijs',
      description: 'An internal DSL for the Nix package manager in JavaScript',
      'dist-tags': { latest: '0.0.18' },
      ...
      dist: 
       { shasum: 'bfdf140350d2bb3edae6b094dbc31035d6c7bec8',
         tarball: 'http://registry.npmjs.org/nijs/-/nijs-0.0.18.tgz' },
      ...
    }
    

    We can translate the above metadata into the following Nix function invocation:

    fetchurl {
      name = "nijs-0.0.18.tgz";
      url = http://registry.npmjs.org/nijs/-/nijs-0.0.18.tgz;
      sha1 = "bfdf140350d2bb3edae6b094dbc31035d6c7bec8";
    }
    

  • Version ranges are in principle unsupported in Nix in the sense that you cannot write a function that takes a version range specifier and simply downloads the latest version of the package that conforms to it, since it conflicts with Nix's reproducibility properties.

    If we would allow version ranges to be downloaded then the hash code inside a Nix store path does not necessarily refer to the same build result anymore. For example, running the same download tomorrow might give a different result, because the package has been updated.

    For example, the following path:

    /nix/store/j631r0ak98156v1xkx22n4fsl3zbmzi8-node-slasp-0.0.x
    

    Might refer to slasp version 0.0.4 today and to version 0.0.5 tomorrow, while the hash code remains identical. This is incompatible with Nix's deployment model.

    To still support deployment of packages having dependencies on version ranges of packages, we basically have to "snapshot" a dependency version by running:

    $ npm info nijs@0.0.x
    

    and create a fetchurl {} invocation from the particular version that is returned. The disadvantage of this approach is that, if we want to keep our versions up to date, we have to repeat this step every time a package has been updated.

  • The same thing applies to wildcard version specifiers. However, there is another caveat -- if we encounter a wildcard version specifier, we cannot always assume that the latest conforming version can be taken, because NPM also supports shared dependencies.

    If a shared dependency conforms to a wildcard specifier, then the dependency is not downloaded, but the shared dependency is used instead, which may not necessarily be the latest version. Otherwise, the latest conforming version is downloaded. Shared dependencies are explained in the next section.
  • Also for 'latest' and 'unstable' we must do a snapshot trick. However, we must also do something else. If NPM encounters version specifiers like these, it will always try to consult the NPM registry to check which version corresponds, which is undesirable. To prevent that we must substitute these version specifiers in the package.json file by '*'.
  • For HTTP/HTTPS and Git/GitHub URLs, we must manually compose fetchurl {} and fetchgit {} function invocations, and we must compute their output hashes in advance. The nix-prefetch-url and nix-prefetch-git utilities are particularly useful for this. Moreover, we also have to substitute URLs by '*' in the package.json before we run NPM inside a Nix expression, to prevent it from consulting external resources.

Private, shared and cyclic dependencies


Like the Nix package manager, NPM has the ability to support multiple versions of packages simultaneously -- not only the NPM package we intend to deploy, but also all its dependencies (which are also NPM packages) can have their own node_modules/ folder that contain a package's private dependencies.

Isolation works for CommonJS modules, because when a module inside a package tries to include another package, e.g. through:

var slasp = require('slasp');

then first the node_modules/ folder of the package is consulted and the module is loaded from that folder if it exists. Furthermore, the CommonJS module system uses the absolute resolved full paths to the modules to make a distinction between module variants and not only their names. As a consequence, if a resolved path to a module with a same name is different, it's considered a different module by the module loader and thus does not conflict with others.

If a module cannot be found in the private node_modules/ folder, the module loading system recursively looks for node_modules/ folders in the parent directories, e.g.:

./nijs/node_modules/slasp/node_modules
./nijs/node_modules
./node_modules

This is how package sharing is accomplished in NPM.

NPM's policy regarding dependencies is basically that each package stores all its dependencies privately unless a dependency can be found in any of the parent directories that conforms to the version specification declared in the package. In such cases, the private dependency is omitted and a shared one will be used instead.

Also, because a dependency is installed only once, it's also possible to define cyclic dependencies. Although it's generally known that cyclic dependencies are a bad practice, they are actually used by some NPM packages, such as es6-iterator.

The npm help install manual page says the following about cycles:

To avoid this situation, npm flat-out refuses to install any name@version that is already present anywhere in the tree of package folder ancestors. A more correct, but more complex, solution would be to symlink the existing version into the new location. If this ever affects a real use-case, it will be investigated.

In Nix, private and shared dependencies are handled differently. In Nix, packages can be "private" because they are stored in separate folders in the Nix store which paths are made unique because they contain hash codes derived from all its build-time dependencies.

Sharing is accomplished when a package refers to the same Nix store path with the same hash code. In Nix these mechanisms are more powerful, because they are not restricted to specific component types.

Nix does not support cyclic dependencies and lacks the ability to refer to a parent if a package is a dependency of another package.

To simulate NPM's way of sharing packages (and means of breaking dependency cycles) in Nix, I ended up write our function that deploys NPM packages (named: buildNodePackage {}) roughly as follows:

{stdenv, nodejs, ...}:
{name, version, src, dependencies, ...}:
{providedDependencies}:

let
  requiredDependencies = ...;
  shimmedDependencies = ...;
in
stdenv.mkDerivation {
  name = "node-${name}-${version}";
  inherit src;
  ...
  buildInputs = [ nodejs ... ] ++ requiredDependencies;
  buildCommand = ''
    # Move extracted package into the Nix store
    mkdir -p $out/lib/node_modules/${name}
    mv * $out/lib/node_modules/${name}
    cd $out/lib/node_modules/${name}
    ...
    
    mkdir -p node_modules
    # Copy the required dependencies
    # Generate shims for the provided dependencies
    ...

    # Perform the build by running npm install
    npm --registry http://www.example.com --nodedir=${nodeSources} install
    ...

    # Remove the shims
  '';
}

The above expression defines a nested function with the following structure:

  • The first (outermost) function's parameters refer to the build time dependencies used for the deployment of any NPM package, such as the Nix standard environment that contains a basic UNIX toolset (stdenv) and Node.js (nodejs).
  • The second function's parameters refer to a specific NPM package's deployment parameters, such as the name of the package, the version, a reference to the source code (e.g. local path, URL or Git repository) and its dependencies.
  • The third (innermost) function's parameter (providedDependencies) is used by a package to propagate the identifiers of the already provided shared dependencies to a dependency that's being included, so that they are not deployed again. This is required to simulate NPM's shared dependency mechanism and to escape infinite recursion, because of cyclic dependencies.
  • From the dependencies and providedDependencies parameters, we can determine the required dependencies that we actually need to include privately to deploy the package. requiredDependencies are the dependencies minus the providedDependencies. The actual computation is quite tricky:
    • The dependencies parameter could be encoded as follows:
      {
        optparse = {
          ">= 1.0.3" = {
            version = "1.0.5";
            pkg = registry."optparse-1.0.5";
          };
        };
        ...
      }
      

      The dependencies parameter refers to an attribute set in which each attribute name represents a package name. Each member of this attribute set represents a dependency specification. The dependency specification refers to an attribute set that provides the latest snapshot of the corresponding version specification.
    • The providedDependences parameter could be encoded as follows:
      {
        optparse."1.0.5" = true;
        ...
      }
      

      The providedDependencies parameter is an attribute set composed of package names and a versions. If a package is in this attribute set then it means it has been provided by any of the parents and should not be included again.
    • We use the semver utility to see whether any of the provided dependencies map to any of the version specifications in dependencies. For example for optparse means that we run:

      $ semver -r '>= 1.0.3' 1.0.5
      $ echo $?
      0
      

      The above command exits with a zero exit status, meaning that there is a shared dependency providing it and we should not deploy optparse privately again. As a result, it's not added to the required dependencies.
    • The above procedure is basically encapsulated in a derivation that generates a Nix expression with the list of required dependencies that gets imported again -- a "trick" that I also used in NiJS and the Dynamic Disnix framework.

      The reason why we execute this procedure in a separate derivation is that, if we do the same thing in the builder environment of the NPM package, we always refer to all possible dependencies which prevents us escaping any potential infinite recursion.
  • The required dependencies are copied into the private node_modules/ as follows:

    mkdir -p node_modules
    cp -r ${optparse propagatedProvidedDependencies} node_modules
    cp -r ${slasp propagatedProvidedDependencies} node_modules
    

    Now the innermost function parameter comes in handy -- to each dependency, we propagate the already provided dependencies, our own dependencies, and the package itself, to properly simulate NPM's way of sharing and breaking any potential cycles.

    As a sidenote: to ensure that dependencies are always correctly addressed, we must copy the dependencies. In older implementations, we used to create symlinks, which works fine for private dependencies, but not for shared dependencies.

    If a shared dependency is addressed, the module system looks relative to its own full resolved path, not to the symlink. Because the resolved path is completely different, the shared dependency cannot be found.
  • For the packages that are not considered required dependencies, we must generate shims to allow the deployment to still succeed. While these dependencies are provided by the includers at runtime, they are not visible in the Nix builder environment at buildtime and, as a consequence, deployment will fail.

    Generating shims is quite easy. Simply generating a directory with a minimal package.json file only containing the name and version is enough. For example, the following suffices to fool NPM that the shared dependency optparse version 1.0.5 is actually present:

    mkdir node_modules/optparse
    cat > node_modules/optparse/package.json <<EOF
    {
      "name": "optparse",
      "version": "1.0.5"
    }
    EOF
    

  • Then we run npm install to execute the NPM build steps, which should succeed if all dependencies are correctly specified.
  • Finally, we must remove the generated shims, since they do not have any relevant meaning anymore.

Manually writing a Nix expression to deploy NPM packages


The earlier described function: buildNodePackage {} can be used to manually write Nix expressions to deploy NPM packages:

with import <nixpkgs> {};

let
  buildNodePackage = import ./build-node-package.nix {
    inherit (pkgs) stdenv nodejs;
  };

  registry = {
    "optparse-1.0.5" = buildNodePackage {
      ...
    };
    "slasp-0.0.4" = buildNodePackage {
      ...
    };
    "nijs-0.0.18" = buildNodePackage {
      name = "nijs";
      version = "0.0.18";
      src = ./.;
      dependencies = {
        optparse = {
          ">= 1.0.3" = {
            version = "1.0.5";
            pkg = registry."optparse-1.0.5";
          };
        };
        slasp = {
          "0.0.4" = {
            version = "0.0.4";
            pkg = registry."slasp-0.0.4";
          };
        };
      };
    };
  };
in
registry

The partial Nix expression (shown above) can be used to deploy the NiJS NPM package through Nix.

Moreover, it also provides NiJS' dependencies that are also built by the same function abstraction. By using the above expression, and the following command-line instruction:

$ nix-build -A '"nijs-0.0.18"'
/nix/store/y5r0raja6d8xlaav1mhw8jjxvx7bap85-node-nijs-0.0.18

NiJS is deployed by the Nix package manager including its dependencies.

Generating Nix packages from NPM package configurations


The buildNodePackage {} function shown earlier makes it possible to deploy NPM packages with Nix. However, its biggest drawback is that we have to manually write expressions for the package we want to deploy including all its dependencies. Moreover, since version ranges are unsupported, we must manually check for updates and update the corresponding expressions every time, which is labourious and tedious.

To solve this problem, a tool has been developed named: npm2nix that can automatically generate Nix expressions from NPM package.json specifications and collection specifications. It has several kinds of use cases.

Deploying a Node.js development project


Running the following command generates a collection of Nix expressions from a package.json file of a development project:

$ npm2nix

The above command generates three files registry.nix containing Nix expressions for all package dependencies and the packge itself, node-env.nix contains the build logic and default.nix is a composition expression allowing users to deploy the package.

By running the following Nix command with these expressions, the project can be built:

$ nix-build -A build

Generating a tarball from a Node.js development project


The earlier generated expressions can also be used to generate a tarball from the project:

$ nix-build -A tarball

The above command-line instruction (that basically runs npm pack) produces a tarball that can is placed in the following location:

$ ls result/tarballs/npm2nix-6.0.0.tgz

The above tarball can be distributed to others and installed with NPM by running:

$ npm install npm2nix-6.0.0.tgz

Deploying a development environment of a Node.js development project


The following command-line instruction uses the earlier generated expressions to deploy all the dependencies and opens a development environment:

$ nix-shell -A build

Within this shell session, files can be modified and run without any hassle. For example, the following command should work without any trouble:

$ node bin/npm2nix.js --help

Deploying a collection of NPM packages from the NPM registry


You can also deploy existing NPM packages from the NPM registry, which is driven by a JSON specification that looks as follows:

[
  "async",
  "underscore",
  "slasp",
  { "mocha" : "1.21.x" },
  { "mocha" : "1.20.x" },
  { "nijs": "0.0.18" },
  { "npm2nix": "git://github.com/NixOS/npm2nix.git" }
]

The above specification is basically an array of objects. For each element that is a string, the latest version is obtained from the NPM registry. To obtain a specific version of a package, an object must defined in which the keys are the names of the packages and the values are their version specifications. Any version specification that NPM supports can be used.

Nix expressions can be generated from this JSON specification as follows:

$ npm2nix -i node-packages.json

And using the generated Nix expressions, we can install async through Nix as follows:

$ nix-env -f default.nix -iA async

For every package for which the latest version has been requested, we can directly refer to the name of the package to deploy it.

For packages for which a specific version has been specified, we must refer to it using an attribute that name that is composed of its name and version specifier.

The following command can be used to deploy the first version of mocha declared in the JSON configuration:

$ nix-env -f default.nix -iA '"mocha-1.21.x"'

npm2nix can be referenced as follows:

$ nix-env -f default.nix \
    -iA '"npm2nix-git://github.com/NixOS/npm2nix.git"'

Since every NPM package resolves to a package name and version number we can also deploy any package by using an attribute consisting of its name and resolved version number. This command deploys NiJS version 0.0.18:

$ nix-env -f default.nix -iA '"nijs-0.0.18"'

The above command also works with dependencies of any package that are not declared in the JSON configuration file, e.g.:

$ nix-env -f default.nix -iA '"slasp-0.0.4"'

Concluding remarks


In this lengthy blog post (which was quite a project!) I have outlined some differences between NPM and Nix, sketched an approach that can be used to deploy NPM packages with Nix, and described a generator: npm2nix that automates this approach.


The reason why I wrote this stuff down is that the original npm2nix developer has relinquished his maintainership and I became co-maintainer. Since the NixOS sprint in Ljubljana I've been working on reengineering npm2nix and solving the problem with cyclic dependencies and version mismatches with shared dependencies. Because the problem is quite complicated, I think it would be good to have something documented that describes the problems and my thoughts.

As part of the reengineering process, I ported npm2nix from CoffeeScript to JavaScript, used some abstraction facilities to tidy up the pyramid code (caused by nesting of callbacks), and modularized the codebase it a bit further.

I am using NiJS for the generation of Nix expressions, and I modified it to have most Nix language concepts supported (albeit some of them can only be written in an abstract syntax). Moreover, the expressions generated by NiJS are now also pretty printed so that the generated code is still (mostly) readable.

The reengineered npm2nix can be obtained from the reengineering branch of my private GitHub fork and is currently in testing phase. Once it is considered stable enough, it will replace the old implementation.

Acknowledgements


The majority of npm2nix is not my work. Foremost, I'd like to thank Shea Levy, who is the original developer/author of npm2nix. He was maintaining it since 2012 and figured out most of NPM's internals, mappings of NPM concepts to Nix and how to use NPM specific modules (such as the NPM registry client) to obtain metadata from the NPM registry. Most of the stuff in the reengineered version is ported directly from the old implementation done by him.

Also I'd like to thank the other co-maintainers: Rok Garbas and Rob Vermaas for their useful input during the NixOS sprint in Ljubljana.

Finally, although the feedback period is open for only a short time, I've already received some very useful comments on #nixos and the Nix mailing list by various developers that I would like to thank.

Related work


NPM is not the only tool that does build and dependency management. Another famous (or perhaps notorious!) tool I found myself struggling with in the past was Apache Maven, which is quite popular in the Java world.

Furthermore, converters for other kinds of packages to Nix also exists. Other converters I am currently aware of are: cabal2nix, python2nix, go2nix, and bower2nix.