Anti-pattern: Semantic Code in Comments

Sematic Code in Comments

Have you ever encounter semantic code in comments? I mean, code in a different language to the main one used (could be a DSL), having impact on the semantics and been part of the final behavior or the program?

It’s not code commented and therefore deactivated. No, this code still influences the behavior of your app.

Yes, it sound weird, I know. But from time to time you can find it hidden in code comments.

This post discuss about this pattern found in some languages and argues about why it should be considered and anti-pattern in most of cases.

So, let’s do some some deep dive in code archaeology and review some examples:

1997. MFC in C++ (Microsoft Visual C++)

If you are an old dev, like myself, you will probably remember this: ATL & MFC.

ATL stands for Active Template Library and was included in Microsoft Visual C++ early versions to help using MFC (Microsoft Foundation Classes).

From the book: “Learn the MFC C++ Classes”  you can found the following code snippet to define a message map for a window:

BEGIN_MESSAGE_MAP (C_MainFrame, CFrameWnd)
//{{AFX_MSG_MAP(C_MainFrame)
//}}AFX_MSG_MAP
END_MESSAGE_MAP()

Lines 1 and 4 are C macros. Note here the important lines are 2 and 3: they are inline comments! If you remove or change the line contents, you break the program. :-ooo They are comments, but its content is processed to change the final program.

The code sample came with the following prominent and scary note to readers:

Note: You must be very careful to type these comment lines in exactly as
they appear in the preceding examples. You cannot leave any extra blanks in
these comment lines. The compiler will not catch typing errors because these
are comment lines. If there is any error whatsoever in these comment lines,
your compiler tool will not be able to perform and will give unusual
messages or simply come up with a dialog box with empty entries. If there is
any problem getting your compiler tool to work, be sure to very carefully
re-examine these comment lines for correctness. I cannot emphasize this
point strongly enough!

Working with this technology in a project in my early years as engineer, I can confirm the note: extra spaces used to broke the parsing ???!!!!!! and it comes with unfriendly at all compiler errors. So every time we touch that part we use it as black-magic: repeating the spell letter by letter without asking too much what the hell these comments means!?

2006. PlusCalc in TLA+ comments

TLA+ is an excellent specification language for concurrent and distributed systems developed by Dr. Leslie Lamport. TLA+ comes with and IDE based on Eclipse called TLA Toolbox. TLA+ specifications can have a PlusCalc algorithms attached as comments next to a TLA+ specification. See the sample (original source).

Algorithm in PlusCalc starts at line 46 and ends at line 230 surrounded by a long multi-line comment: (*   *).
Algorithm is signaled with the escape sequence ‘−−’.

Note how the extra curly brackets are needed to mark the traditional multi-line comments and disambiguate where the code is located.

2008. Documentation Examples in R

R is an open source statistical oriented language. R has the notion of packages as the way to pack reusable libraries for others. In packages all public functions must be document using roxigen2 syntax. This documentation can have examples (see lines: 12-13)

#' Creates a connection.
#'
#' Setup a connection to a given backend with credentials.
#'
#' @param url The base URL to connect to.
#' @param user The user credential.
#' @param pass The password for the user.
#'
#' @return connect returns a connection object for a backend.
#' This connection object is used to access resources.
#' @export
#' @examples
#' cnx <- connect("http://www.acme.com", "demo", "1234")
connect <- function(url, user, pass) {
  urlbase <- httr::handle(url)
  status <- httr::GET(handle=urlbase,
                      config=httr::authenticate(user, pass),
                      path="api/status")
  con <- list(url, user, pass, status)
  return (con)
}

In this case the editor (RStudio) has partial support for coloring keywords inside the comments.

Also, the R environment allows to compile and check the package. Not only for compiling the R code (lines 14-21) but also generates the documentation and pick-up example code in line 13, evaluate the and run to double check the sample is valid.

Ok, not a best practice, but at least, we can run the code in the comments and check its working.

However, don’t ask for debugging capabilities inside examples. =:-o

 

2009. Metadata in Go

We use to repeat our errors once and another. And this one specially hurt my heart: honestly, I not expect to see this kind of design errors on a modern language like Go. I am used to reflection and metadata in languages like Java and .NET where metadata are first class citizens in the language. Both are prior art to Golang, so my expectation with this were very high.

But in Go, designers decided that a string would be enough and implemented it as Smart Tags.

Smart Tags in Go are syntactically speaking: a string label attached to a property surrounded by backticks: (`).

This is how metadata looks like in Go:

type ExampleType struct {
    Name    string `url:"name"`
    Address string `url:"address"`
    City    string `url:"city"`
}

This again suffer the same problems as previous one: you can not type-check the content of the metadata or even check if the formatter is valid for the datatype specified in compile time.

The tag expression is a string. If you want to inspect it, you need to do manual parsing to extract its values on runtime. :-oooooo!?

For simple metadata like the one presented, we can do the hack with a regex. But for complex ones, no simple path my friend.

 

2016. Templates in Angular

This should be familiar to you if you are a web developer with Angular or other similar frameworks, to embed scripts as strings. HTML inside JS, CSS inside JS, JS inside HTML, HTML containing JS emitting HTML (with JS)... endless fun!:

Take a look the following Angular Component:

import { Component } from '@angular/core';
@Component({
  selector: 'my-app',
  template: `

Hello {{name}}!

` styles: [`h1 { color: red; }`] }) export class AppComponent { name = 'world'; }

 

Look at lines 4 and 5. Template is embedded HTML and Style is an array of embedded CSS. All of them inside a TypeScript file, by the way. In fact, they are just plain strings. The editor or IDE is not going to parse such HTML nor CSS for errors.

Fortunately, the Angular team already recognize the anti-pattern and the default recommendation is to extract the HTML and CSS to external files replacing the property template by templateUrl and styles for styleUrls, respectively.

Inline templates should be used only for brief examples with trivial embedded markup. The problem arise when the markup grows and the refactor is not made on time.

Having externals files is always good because IDEs already know how to parse and colorize HTML and CSS, for example.

 

Cons

What problems arise with this approach? We already saw them, but let’s recap:

 

  1. Comments always compile. Therefore, errors on this second DSL are not going to be easily discoverable leading to latent errors hidden in comments.

Sample: Type checking on MFC macros, type checking on metadata on Go.

Exception: Although Documentation Examples in R are written in comments using Roxigen2 syntax. The R toolbox provides tooling to compile and check the documentation. And moreover, to execute the R code in the samples to double check your code is error-free.

  1. Comments are not parsed by traditional IDEs. Therefore, everything inside the comment is not tokenized, neither colored in your favorite IDE. No code completion available. What to expect? It is a comment after all! Blind coding, old style, use vi as editor for extra bonus.
  2. Separation of concerns violation: two concerns intermingled one of them hidden on a comment. Looks no good, ma’.

Magic String are bad, right? As we just reviewed, hidden code in inside comments are evil.

 

But why people do this?

First of all, now. I need to raise my hand and say I already did it once, only once, but I did it. I put a DSL language inside the comments of another DSL language. I had to live with the technical debt it generated during its life-cycle. I regret for doing so, and this post is my shared friendly reminder for me for not do it is again.

But why language designer ended up with this design? Here are some reasons for it:

Minimal Implementation Cost (Short Term)

As code is comment, there is no need to change the grammar, parser, and tooling associated to the host language. It will need a second grammar and parser to found and extract the expressions in the parasitic language hidden in comments.

Then, with both language already parser, the main output can be accommodated to reflect the behavioral changes needed. Of course, this estimation of cost did not include maintenance, and the time lost by users chasing bugs.

Organic Language Growth

The main language is usually developed in first place. Then, soon or later a need for extra expressiveness appears. And a second parasitic DSL appears. Putting the new language expression on comments into the first one allows to add the new semantic in a quick & dirty way. You don’t need to change the grammar, parsing and tooling associated to the first language. The parsing and processing of the second language is added in a second stage of the build process.

Lack of Design

Quick additions when the second DSL is not yet closed can follow the path to use comments for an easy implementation in terms of costs.

Technical Debt

But at the end, the DSL remains in comments, the refactor to become a first class citizen in the existing first language or composed with the first one is not done: generating at the end a heavy Technical Debt clearly visible at the language design level.

Backward compatibility

If the language is already in use, then we have customers already programming in comments. So, after the first public release, removing the comments is a breaking change difficult to explain and possibly painful to migrate to the new syntax.

 

Before conclude, let’s review another example of mixing languages, very controversial, but a bit different.

2016. React JSX

Focus in the sample:

const element = (
  

Hello, world!

);

React introduced JSX and later TSX as extensions to Javascript first and TypeScript later to be able to embed HTML nodes construction inside an imperative language. Lines 2-5 are HTML embedded inside JS/TS code.

But, this time is quite different: the code is not a comment, nor a string any more. It is compiled and properly parsed to build a DOM node using a createElement() call. Tool support provided colorization and code-completion with the appropriate IDE.

Some love this mix, others don’t like it. My opinion is biased to the second camp due to Separation of Concerns issues, but this is another story for another post.

 

Conclusions

We have reviewed different languages across different times and have seen how we keep insisting in committing this kind of error when two languages are mixed or composed. Users suffer the errors provoked a the end for a deficient language integration.

As stated before, I commit it once, and this is my public promise to myself to try to never repeat the same error again. If this post helps only one other person from do it wrong, my objective with this post is achieved.

Do you have more samples with other pairs of languages embedded as comments? I would like to know. Feel free to send me a link and I will include them with proper credit.

2 comments.

  1. PHP is full of neat tricks trying to do some type hinting for parameters and return values in comments in javadoc style. It is of no use in runtime (and there is no compiling, this is a scripting language), but it assists the IDE and some linters in trying to detect errors in usage of functions and methods.

    The reason is simple, if the language creators don’t evolve the language to accommodate the needs for new more complex paradigms (er… like… basic type validation? so new… but you can enforce class validation, so strange…), years waiting for PHP 7, developers extend in the only way they can: with comments.

    But you have pointed out some capital sins here, like the Go creators not having that in account in their design from the beginning. Very interesting indeed.

  2. Indeed Vicente!
    I suppose that exact lack of types is what motivated Facebook to create Hack to type PHP and have a compile time chance to catch more bugs.

Post a comment.