Root CUIP Metalevel

Choosing a Template Engine for code generation

Selecting a good Template Engine is an important issue when dealing with code generation.

Since 1998 I’ve been testing almost lots of them and forming an opinion about it at the same time the technology evolved. Been at CG2009 and attending the session of Kathleen Dollard on Template Specialization brought to my mind all the times I wasn’t satisfied with my current template engine and changed to try for a new one.

Doing commercial code generators, I’ve been tested lost of techniques: direct string concatenation, XSLT, direct code generation (or what Kathleen calls brute force code gen), ASP, JSP based approaches, developing custom template engines (two of them), taking a look to Code Smith, T4 and others, using NVelocity and finally arriving till StringTemplate (for the moment).

The essence of a Code Generation

Code generation is most of the time producing output source code as text files.
In the text-based generation process, the model is used as metadata by a the control (a transformation program or control rules) to fulfill a text template (basically a text form with named holes).

Models, transformation algorithms and templates are core assets to be reused and recombined in different ways across the MDD discipline. So, selecting a good technology to mix all them’up in the proper way is essential for success.

When writing templates for code generation (M2T) or (M2M model to model) you need to have at least three languages in mind at the same time: the input language (or model), the transformation language by itself and the output language.

Usually in and out models/langs are complex enough so it would be interesting to have the easiest of use transformation tools we found and at the same time powerful enough to make the job.

Pros and cons for each template engine approach can only be stated respect to some requirements. Different requirements imply different selection criteria. So let’s start expliciting my requirements for a text template engine.

My requirements

  1. Direct recognition of the final output. Easy to read and understand. A template represent a piece of output of the target language.
  2. Easy to maintain. Templates are created to be changed. And when needed, the change should be done in a quick and agile way.
  3. Reusable. Templates should be reusable in different contexts.
  4. Minimalistic. Due to requirement (1) & (2) a template language must resemble the final output with the minimal necessary distracting extra lexicon not belonging to the output language.
  5. Composable. A template must for sure be callable from other templates.
  6. Separation of Concerns. Separate the control (the transformation) from the template. MVC Pattern.
  7. Idempotent. A template execution must be repeatable whenever is needed. Reapplying the template should generate always the same output, no matter how many times you call it. Therefore, it should to contain lateral effects: not more than the line timestamp of when the generation process took place if needed.

Now, we can review the main pros & cons of the technologies involved respect to these requirements:

Direct string concatenation (aka Brute Force approach)

This is one of the first techniques we all tried for first time. It’s easy and direct. You start choosing your favorite language like C# and start doing things like this one:

Output code:

        select Id, Name, Surname
        from Customer
        where Country LIKE @vCountry

Generator:

        string sql;
        string sqlRes;
        sql += " select Id, Name, Surname";
        sql += " from " + tableName;
        sql += " where Country LIKE @vCountry";
        sqlRes = String.Format(sql);

If you are a bit more sophisticated (or geek enough!!) and concerned about the performance, you will change string concatenation for a buffer based approach like this one:

        System.Text.StringBuilder sql = new System.Text.StringBuilder();
        sql.Append(" select Id, Name, Surname");
        sql.AppendFormat(" from {0}", tableName);
        sql.Append(" where Country LIKE @vCountry");
        string sqlRes = String.Format(sql.ToString());

But at the end of the day you will still not fulfilling the majority of the requirements. It’s not easy to maintain such template: mainly because we are mixing the transformation language (C#) with the output one (SQL in this case).

As an extreme case of this to illustrate how much the thing can get worse, I remember myself concatenating strings in 1999 inside C++ to generate ColdFusion code, this CF output code was, at the same time, generating Javascript on the fly, and this one, finally generating some kind of Dynamic HTML for a web application. It’s weird, isn’t it?

Character escaping like \” for adding a quote in the right place or thinking twice if a semicolon is needed or not was a totally nightmare in addition of trying to have in mind the syntax of 4 languages at the same time.

XSLT

Xml Stylesheets Transformations is a W3C standard proposed for translating XML to different text outputs like XML again, HTML, text, code, etc.

The idea originally was quite good, XML needs to be transformed quite frequently to adapt to special needs. And making a XML transformation based on XML languages makes sense because you were using apparently the same XML base language.

However, if you ever used XSLT you will find that is poorly user-friendly, it has a very bad legibility and, therefore, it ends in an unmaintainability hell. A week later than you coded the template, you will start to have problems in remember what these templates were trying to solve.

A sample:



    
        
            select Id, Name, Surname
            from 
            where Country LIKE @vCountry  
        
    

So, if you are in a proffessional context, you need easy maintainability and XSLT shouldn’t be your first choice (nor also the second unless you can generate the XSLT itseft).

It is also a paradox to see how XML was designed to be user readable, however XSLT it isn’t once you start to create non trivial samples.

XSLT was not bad for doing simple formatting on the XML. For example, generating HTML from XML with XSLT is ok as far as you don’t try to change the structure of the input tree too much (the output must be following the same tree structure or be a subtree of it).

Any way, mixing the XML syntax of XSLT and the output like HTML makes the whole much more cryptic.

CodeDom

CodeDom is a powerful (but not nice) API allowing anyone inside a .NET program to generate .NET code in any .NET language. The price to pay is that CodeDom flights at a very, very low level altitude respect to compilers and the underground MSIL instructions.

Code is quite complex even for trivial constructions. On the other hands provides language independence, allowing you to generate C#, VB.NET or any other .NET language as soon a CodeDom serializer is implemented.

Visual Studio uses CodeDom intensively to do things such as generating code for controls whenever you drag & drop a control on the designer surfaces. And it does it in the language you are working with.

If you don’t need to generate code in such context, avoid the usage of CodeDom.

A hardcore version of CodeDom is to use the Emit() functions to create tangible MSIL (like bytecode in Java) on the fly. But this is more a hacking technique or a final compiler than a real need for our usual needs.

ASP, JSP syntax-like (Code Smith and others)

ASP based languages mixes XML tags with code inserted in between.

Having the opportunity to add code inside the template, gives you the “bad” chance to mix presentation (the template) and the control (the code you apply to drive the transformation).

It’s interesting to see how the Web developer of the Java community moved apart from this kind of practices and started to use the MVC pattern (Model View Controller) providing an strict separation as they did with the Struts framework. Ruby guys did the same with Ruby on Rails. And finally ASP.NET is doing the same with the MVC framework for .NET.

So, if web developers notice that having templates and control code mixed at the same file it not desirable for maintainability and reuse, why us, the code generators designer and developers, are not seeing it in general?

NVelocity

NVelocity is the .NET port of the Apache Velocity Project. I used it for two years because it has a good features for splitting the templates in manageable chunks and call to subtemplates whenever it is needed. The control was starting to be limited to keep the control as simple as possible: I liked the approach and used it extensibly, but still having too much open control.

Xpand

On the Java side of the world, Eclipse guys have another approach called Xpand.

OpenArchitectuWare and other projects use it extensively and has well supported on Eclipse with color syntax and code completion features.

To be honest I don’t tested it yet, but I promise to do it soon. Therefore, I will not comment more on it before trying it.

T4

With the apparition of Microsoft DSL Tools, the T4 template engine came into scene.
T4 generally runs inside Visual Studio (can be executed from the command line also) and each *.tt file can be generating one dependent code file.

The first thing I don’t like is the one to one mapping. Usually I need to apply my code generation in a one to many files operation to process collections of them. It’s true that there are workarounds allowing you to generate multiples files from a T4 template. But it is more a hack that a built-in feature of the engine.

As a derivate of the ASP syntax, control and template are mixed. Templates are compiled before executed. This should be great for the responsiveness of code generation time. However, generation time usually takes longer inside Visual Studio that applying other technologies based in interpretation: I do not know why, may be the bottleneck is in an external dependency in the generation process, such us loading the input models.

On the pros side, the templates have intelisense support and syntax coloring inside Visual Studio to make it easy the editing experience.

Of course T4 was designed to be a convenient way of extension for adding code generation to a Visual Studio project. However, it also encompasses some design choices not necessarily optimal for other usages. This is mainly my case: I’am not saying T4 is a useless. Just it is simply is not well suited for my needs as stated in my previous requirements.

StringTemplate

Finally, I’ve been using the great Terence Parr’s ANTLR parser for a long time. And it was only a question of time to try his companion template engine: String Template.

I like Terence Parr’s StringTemplate approach because this enforces a strict Separation of Concerns when doing code generation: separating the template (a text form with holes and nothing else), from the model, and from the transformation itself (control code to apply the template).

For years I’ve been looking for something so well aligned with my ideas, so finally I felt that I found it.

Terence has a nice article to read to understand the point of view: Enforcing Strict Model View separation in Template Engines.

A StringTemplate sample:

       
group sql-gen;

genSelect(tbl)::=<<
select $tbl.Fields:genColumn(); sepatator=", "$
from table $tbl.Name$
>>

genColumn()::=<<
[$it.Name$]
>>

In StringTemplate each template can receive parameters. A template can call a subtemplate, and express nice things to take care of the formatting like separators or automatic indentation. Iterations are expressed in the following way $collectionExpression:templateToApply()$. But nothing more: there is not more content for control. $it$ stands for the model item been iterated (implicit this reference if you want).

The template is reusable. It will work whenever you pass an object containing a property called Name and having a collection of objects convertible to string called Fields. These are all the requisites the template is imposing to the input model.

During the CG2009 conference and also during CG2008 we always discuss with our Microsoft colleagues why we are using DSL Tools but skipping the usage of T4 templates? I promised myself to write a blog-post like this one to try to explain my reasons.

To sum up

From an MDD point of view, text engine templates constitute a well defined domain. Having a precise DSL language to deal with the domain makes perfect sense in this context. Therefore, the language must be designed to express what is expected to do in the best way without introducing any further unneeded fanfare.

And you? What do you think about it? What is your experience in the usage of template engines?

5 comments.

  1. Hi Pedro! Really nice overview. You should give Xpand a try, the new version has been released this week with Eclipse Galileo. I think it fulfills your requirements quite well, although it’s not .NET. Would be interesting what your opinion is.
    ~Karsten

  2. Thanks Karsten! Fore sure, I will do. I’ll take the time to test Galileo with the full package.

  3. Due to my experiences over the previous 7 years, I myself am not a big fan of using template engines for code generation. I am not an opposer either, though. Some years ago I was using ArcStyler (Interactive Objects GmbH) with its template and generation engine CARAT for several years. That worked quite fine for me. Then, back in 2006, I was searching for an easy-to-use object-oriented generation framework/tool that doesn’t use the template approach as a core concept. Reason for that: I wanted to re-use all the benefits that OOA/OOD/OOP brings on the table and I wanted to have a flat learning curve, for me and for others. Last but not least I wanted to re-use the existing features of (Java-)IDEs. I did not find such a framework. That made me commencing to design and develop my own, simple object-oriented generation framework „JenerateIT“. Meanwhile that framework has matured and has become a general purpose code generation tool that comes with Eclipse integration and is backed by Generative Software GmbH, a company founded in 2007.

    Let me try to outline how that object-oriented generation framework „JenerateIT“ performs in relation to the 7 requirements you have listed further above. Please note that if there were a „JenerateIT for Visual Studio“, you would have a similar situation for the C#/C++/.NET world. In order to avoid misunderstandings: JenerateIT does not do simple System.out.println() calls, but acts on a higher level. Amongst others this approach frees you from having to write to a file sequentially. Instead, when generating a Java file, you for instance can add import statements while generation logic is just writing the implementation of a method.

    Direct recognition of the final output
    =========================================
    Since object-oriented code is not procedural, the code that contributes to the generation of a file cannot be seen at a glance. That’s the price you have to pay in order to get all the advantages mentioned in the remainder of my post.
    To improve the readability and comprehensibility of generation logic, JenerateIT comes with traceability functionality to directly show the relation between generated code and generation logic that created it. Also, you can directly jump from generated code to the generation logic code.

    Easy to maintain
    =========================================
    Changing a template in JenerateIT (= a set of classes that implement generation logic) is equivalent to changing object-oriented Java code. So you may imagine how easy that is. Surely, you have to be familiar with Java and with object-orientation.
    Additional functionality that eases maintenance tasks is mentioned in the last part of my comment on „Direct recognition of the final output“ further above.

    Reusable
    =========================================
    From anywhere in the generation logic you have full access to all other available generation logic. And that can easily be re-used.

    Minimalistic
    =========================================
    When you generate Java code with JenerateIT, your target language is the same as the language for programing generation logic.

    Composable
    =========================================
    Here the same comments hold true as were given for the „Reusable“ requirement further.

    Separation of Concerns
    =========================================
    JenerateIT doesn’t care about the content that your generation logic is going to write into the generated files. It is responsible only for controlling your generation logic, handling model access and file input and output.

    Idempotent
    =========================================
    With JenerateIT you can re-generate a file as often as you want. The content is not changed, and also, if the content doesn’t change, the physical file is not being overwritten. So it keeps the original timestamp. There is an exception from that rule: if your generation logic generates timestamp information or random data into a file, obviously that file will change with every re-generation.

  4. Thank Marcus from commenting about a different approach and comparing using the same terms.

    When you said: “Instead, when generating a Java file, you for instance can add import statements while generation logic is just writing the implementation of a method.”

    Then, if I understood it well, the template engine is taking in account that you are generating Java code and already knows what an “import” means in this context? We can say that your code generator is specially targeted for Java output, isn’t it?

  5. The generation tool JenerateIT itself doesn’t know about how to generate Java files. It is a general purpose generation tool. It can be used to generate any kind of file, be it C++, Java, XML or any other file type.

    JenerateIT provides a plug-in mechanism to add generation logic to be controlled by JenerateIT. We, amongst others, have developed generation logic that knows how to generate a Java file. That generation logic uses JenerateIT’s plug-in mechanism and JenerateIT’s API.

    JenerateIT, through its API, provides the means to split a to-be-generated file into different sections. With this, generation logic at any time can write to any section of the to-be-generated file. The generation logic actually is writing to specific buffers and not to the physical file. The physical file then is written only at the end of the code generation process.
    The aforementioned generation logic for the creation of Java files utilizes that means, for instance to generate import statements in an efficient way.

    Does this throw light on my original comment and answer your question?

    By the way: in a few places in generation logic we use StringTemplate to generate chunks of pure text that have only little or no variability.

Post a comment.