Access Keys:
Skip to content (Access Key - 0)
Welcome to Muck and Brass, the Snowtide blog site    

News from December, 2009

blog entry  2009/12/04
Last changed: Jan 23, 2010 07:42 PM by Chas Emerick

It's strange how some days or weeks have running themes. One theme for me this week programming-wise has been string interpolation:

  • I mentioned it in the #clojure channel on freenode earlier this week (sounds like Rich Hickey isn't a fan of the concept in general, yet),
  • Miles and I talked about it some in connection with the Clojure templating system he's been working on (plug: after recording another episode of the Strictly Professional podcast),
  • and just this morning, I noticed a post by Vassil Dichev about how one might implement string interpolation in Scala

I've become weary of format of late, and all of the other formats out there aren't any more pleasant – variadic (and even keyword or named-argument) string replacement is just a dull tool compared to real interpolation.

The Scala implementation post was the last straw for me, especially because (with all due respect to the Vassil, as he's doing very well with the materials he has at his disposal) it showcases so many of the aspects of Scala that I came to dislike in the course of using it for a year or so: the tortured syntax; the rope, nay, the barbed wire that is implicit conversions; the bear trap of traits.

A Clojure Implementation

OK, enough flame-bait. What I'm really here to do is show how easy it is to add string interpolation to Clojure, and how simple its implementation is:

(ns commons.clojure.strint
 (:use [clojure.contrib.duck-streams :only (slurp*)]))

(defn- silent-read
  [s]
  (try
    (let [r (-> s java.io.StringReader. java.io.PushbackReader.)]
      [(read r) (slurp* r)])
    (catch Exception e))) ; this indicates an invalid form -- s is just string data

(defn- interpolate
  ([s atom?]
    (lazy-seq
      (if-let [[form rest] (silent-read (subs s (if atom? 2 1)))]
        (cons form (interpolate (if atom? (subs rest 1) rest)))
        (cons (subs s 0 2) (interpolate (subs s 2))))))
  ([#^String s]
    (let [start (max (.indexOf s "~{") (.indexOf s "~("))]
      (if (== start -1)
        [s]
        (lazy-seq (cons
                    (subs s 0 start)
                    (interpolate (subs s start) (= \{ (.charAt s (inc start))))))))))

(defmacro <<
  [string]
  `(str ~@(interpolate string)))

Don't mind the namespace – that's just where we put extensions to Clojure-the-language. The public macro << (named as an homage to heredocs) takes a single string argument, and emits a str invocation that concatenates the string data and evaluated expressions contained within that argument.

Example Usage

First, let's get a value we can refer to:

commons.clojure.strint=> (def n 99)

You can do simple value replacement:

commons.clojure.strint=> (<< "There's ~{n} bottles of beer on the wall...")
"There's 99 bottles of beer on the wall..."

And evaluate arbitrary code:

commons.clojure.strint=> (<< "There's ~(dec n) bottles of beer on the wall...")
"There's 98 bottles of beer on the wall..."
commons.clojure.strint=> (<< "There's ~(seq (range n 90 -1))
                              bottles of beer on the wall...")
"There's (99 98 97 96 95 94 93 92 91) bottles of beer on the wall..."

You can use any functions or macros you have available in your Clojure environment:

commons.clojure.strint=> (defn- some-function [] {:name "Chas" :zip-code 01060})
#'commons.clojure.strint/some-function
commons.clojure.strint=> (<< "My name is ~(:name (some-function)), it's nice to meet you.")
"My name is Chas, it's nice to meet you."

...including interop with Java methods:

commons.clojure.strint=> (<< "You have approximately ~(.intValue 5.5) minutes left.")
"You have approximately 5 minutes left."

Caveats

First, let's say what's wrong with this implementation compared to, say, Ruby's string interpolation (I may be missing other points, I'm no Ruby hacker):

  1. Strings cannot be used within interpolated expressions; e.g. this will cause a straightforward parse exception:
    commons.clojure.strint=> (<< "~(str n "another string")")
    #<CompilerException java.lang.IllegalArgumentException:
         Wrong number of args passed to: strint$-LT--LT-
    

    The Clojure reader sees this as providing three arguments to the << macro. Being able to use strings within interpolated expressions would require a "native" Clojure reader macro for interpolated strings, or the ability to define reader macros in "userspace" (Clojure's read table cannot be modified in Clojure code right now – this is an intentional design decision right now).

    Update: pmjordan mentioned on hackernews that you can get around this by escaping the nested strings, like so:

    commons.clojure.strint=> (<< "~(str n \" another string\")")
    "99 another string"
    

    Very true, and very useful in a pinch, but I would definitely consider it to be a wart (and an issue that is insurmountable from Clojure userland right now).

  2. Heredocs aren't available. That's a far more general shortcoming compared to other languages, but is still related to string interpolation. This is significantly mitigated by the fact that Clojure strings are multiline already, but it would be nice in some circumstances to be able to specify a block of text using different delimiters for one-off templating, etc.
  3. Lazy sequences need to be made strict in order for them to print as they do at a REPL (thus the additional seq invocation in the (range n 90 -1)) example above).

Advantages

I'm sure a lot of people will look at this implementation and say, "so what?". Well, it's got a lot going for it:

  1. Simple implementation. Unless you've got a Pavlovian aversion to parentheses (but are somehow immune to piles of braces?), it's very comprehensible.
  2. It's user-land code. Many languages would require a compiler extension or modifications to the language core to pull this off.
  3. The interpolation happens at compile-time! The only processing that occurs at runtime is the concatenation of the chunks of each string, but all of the string and expression parsing happen before your code using the << macro would hit a customer's server or desktop. This is decidedly in contrast with the Scala interpolation implementation, where all of the string parsing is done at runtime; to my knowledge, doing anything else would require a compiler plugin there.
  4. It's fully composible with all other Clojure code. There's no restriction on where you can use the << macro, and no restriction on what Clojure (or Java!) code you can include in interpolation expressions.
  5. There's no magic. Many languages make it very easy to inject magical – as in, opaque – behaviour into your code. The Scala interpolation implementation is no different – to get that special behaviour out of a String, one must call a magical method i in order to rope in the machinery around the InterpolatedString implicit conversion. On the other hand, all of the effects and actors involved in the << macro are local, and its semantics and calling conventions are exactly the same as any other Clojure macro.

Exhale...

So, hopefully that puts string interpolation behind me. I'd love to see something like this become a reader macro in Clojure someday (maybe in conjunction with heredoc support), but in the meantime, this will make a lot of one-off templating jobs a whole lot easier in Clojure compared to using the usual variadic string replacement methods that are otherwise available.

Posted at 04 Dec @ 1:19 PM by user Chas Emerick | comment 0 comments
blog entry  2009/12/28
Last changed: Dec 28, 2009 04:24 PM by Chas Emerick

Over the past month, I've been gradually porting all of our projects' builds from Ant to Maven. Everything's gone swimmingly, especially given the excellent clojure-maven-plugin, which allowed me to cleave off all of our comparatively complicated ant scripts for building and testing Clojure code. One part that did require some work was the porting of the builds associated with our NetBeans Platform-based applications – so, I thought I'd post a couple of hints to help others over the rough spots.

A plug for NetBeans
We've had a good deal of success in using the NetBeans Platform recently (often referred to as the NB RCP). It provides a metric ton of fairly high-quality plumbing for thick-client applications, and definitely saved our asses in a couple of key areas insofar as we've been able to reuse large pieces of the Platform, essentially unchanged, to meet critical new requirements. Of course, that's why we chose to use it in the first place.

Extemporaneous and Lengthy Background

To be clear, the rough spots in question aren't associated with the actual Mavenization of the NetBeans Platform-based projects – that's a relatively straightforward affair, with archetypes available in the NetBeans IDE to get one started, and very well-documented goals available, all provided by the NBM Maven Plugin. Given an existing ant-based build process, I found the actual porting of the build fairly straightforward.

The dicey part had to do with having a set of Platform artifacts available to build against. Under the ant-based build regime, it was common for those building on top of the NB RCP to keep a set of RCP artifacts available in every build environment. This was always a pain (for potentially-obvious reasons that I don't really want to get into now), and the general non-composability of the ant-based build process drove NB RCP users (and the Platform developers themselves) to extreme lengths of hacking to get stuff working properly. (BTW, just so everyone knows, I'm not picking on Fabrizio here – he's just the one who appears to have pushed the envelope more than anyone else vis á vis improving the composability of the ant-based RCP build process.)

One great thing about the NBM Maven Plugin is that it cuts this knot quite elegantly, making it possible to treat NetBeans Modules (NBMs) as first-class citizens within the maven world. So, if you have a maven repository that contains NBMs (like this one hosted by the NetBeans folks themselves), you can readily add NBM dependencies just like you would jar dependencies from maven central:

<dependency>
   <groupId>org.netbeans.api</groupId>
   <artifactId>org-openide-nodes</artifactId>
   <version>${netbeans.version}</version>
</dependency>

...and the NBM plugin will take care of using those NBM dependencies as appropriate:

  • injecting the NBMs' associated jars into the project's compile classpath
  • adding the NBMs as runtime dependencies of whatever NBM(s) your project/application produces
  • adding the NBMs to the (optional) "update site" associated with your NB RCP application (making remote updating of that application in the field trivial)

And, to complete the cycle, the nbm-maven-plugin provides a nbm packaging type, so that you can build NBMs independently, deploy them as you'd expect, and then compose them without any ceremony into however many NB RCP applications you'd like. No suite-chaining, no special platform or cluster artifacts in every build environment, nothing at all different from what one is used to in any other jvm/maven environment.

The Rough Spot

All of the above works flawlessly (at least it has for me in my ~month of usage). The key prerequisite though, is having access to a repository that contains the Platform NBMs that you'd like to use. The repository that I linked to above does not track NetBeans releases in lockstep (e.g. at the time of this posting, the http://bits.netbeans.org/maven2 repo has NBMs from NetBeans v6.5 and v6.7, but not v6.7.1, or the recently-released v6.8). The solution is to populate your own maven repository with those NBM artifacts.

Deploying NetBeans Platform artifacts to your own repository

This might have been a tedious process, were it not for another handy goal from the NBM Maven Plugin, populate-repository, which will push all of the artifacts produced by a NetBeans Platform build (the NBMs themselves, their sources, javadoc, and appropriate non-NetBeans dependency metadata) into your own maven repository.

There's a fair bit of configuration and setup that goes into this though. A HOWTO is provided by the nbm-maven-plugin project, but there are a number of things that it leaves unspoken. So, here's a dump of what I did to successfully populate a Nexus maven repo with a full set of NetBeans Platform artifacts:

  1. Pull the NetBeans Platform sources from the associated hg repo (I used the release68 repo, as we're targeting v6.8 of the NB RCP now). It appears that populating your repo with NB RCP artifacts from a binary download is possible, but then you'll not have the associated javadoc, source artifacts, etc.
  2. Build the entire project – I'm sure it's possible to restrict the build to certain clusters, but I don't see any reason to optimize this process since doing so only saves a little bit of disk.
    1. You must set your JAVA_HOME environment variable to point to a Sun JDK, especially in linux environments that often come with non-Sun JDKs (I'm looking at you, Ubuntu, with your cute gcj JDK). Not doing this will result in very strange compilation errors.
    2. You must set your ANT_OPTS environment variable to specify a higher-than-default maximum heap (export ANT_OPTS=-Xmx1024m worked for me).
    3. Within the top-level of your NetBeans Platform source checkout, run ant; ant nbms build-source-zips build-javadoc – this will build everything you care about in order to populate your maven repo.
  3. You want to have the NBMs in your repository to have appropriate dependency relationships established with third-party artifacts, right? Achieving this is easy if you have Nexus:
    1. unzip sonatype-work/nexus/storage/central/.index/nexus-maven-repository-index.zip somewhere (I used /tmp/nexus-index).
    2. set the nexusIndexDirectory property in the last step to that the path where you unzipped central's index; the nbm-maven-plugin will search that Lucene index to find dependencies referred to within the Platform's NBMs
  4. set MAVEN_OPTS to specify a higher-than-default maximum heap (export MAVEN_OPTS=-Xmx512m worked for me). I'm not sure why this would be required, but I got OutOfMemoryErrors with max heap set to anything less than 512MB. Perhaps searching the maven central repo index is what pushed allocation so high.
  5. Make sure you don't have a pom.xml in your current directory. Bad things will happen.
  6. Decide on a version number for the deployed artifacts, and use it as the value of the forcedVersion property. I used RELEASE68 to go along with the pattern established at http://bits.netbeans.org/maven2; 6.8 makes more sense to me, but if/when the NetBeans maven repo comes up to date with the NetBeans release schedule, sticking with their convention will allow us to use that authoritative repository with no changes to our projects.
  7. Assuming you're deploying to a release repository, make absolutely sure that you've (temporarily) enabled redeployment for that repository! nbm-maven-plugin deploys some NBMs multiple times (presumably while traversing various dependency graphs), and not enabling redeployment will result in errors (400 errors from Nexus, specifically – I can't say what might happen with different repository managers).
  8. Now for the big finish:
    mvn org.codehaus.mojo:nbm-maven-plugin:3.1:populate-repository -DforcedVersion=RELEASE68 -DnetbeansInstallDirectory=nbbuild/netbeans -DnetbeansSourcesDirectory=nbbuild/build/source-zips -DnexusIndexDirectory=/tmp/nexus-index -DnetbeansJavadocDirectory=nbbuild/build/javadoc -DnetbeansNbmDirectory=nbbuild/nbms -DdeployUrl=<nexus_repo_url> -DskipLocalInstall=true

Whew! Let that sucker run for a while, and you should be left with a maven repository fully populated with NetBeans Platform artifacts.

Posted at 28 Dec @ 2:35 PM by user Chas Emerick | comment 2 comments
blog entry  2009/12/30
Last changed: Jan 23, 2010 07:32 PM by Chas Emerick

Of course, I'm not so daft as to say that, but:

If you use an imperative programming language that provides for mutable state, that's what you are saying.

For some background, I read this article yesterday, which contains this choice passage (emphasis mine):

Imagine you've implemented a large program in a purely functional way. All the data is properly threaded in and out of functions, and there are no truly destructive updates to speak of. Now pick the two lowest-level and most isolated functions in the entire codebase. They're used all over the place, but are never called from the same modules. Now make these dependent on each other: function A behaves differently depending on the number of times function B has been called and vice-versa.

In C, this is easy! It can be done quickly and cleanly by adding some global variables. In purely functional code, this is somewhere between a major rearchitecting of the data flow and hopeless.

A comment on proggit very concisely summed up just how crazy the above passage is:

Considering that one of the majors reasons to use FP is so that you don't have such inter-dependencies, it's odd to point that out as an issue.

The whole problem with imperative programming is that state gets threaded everywhere, and you can't look at any function individually and know how it will behave. I won't even go into problems associated with concurrency, where state becomes incredibly difficult to reason about if you allow that sort of thing.

I really appreciated the notion of imperative programming "threading state everywhere". Let's drive the point home, though.

Hey, I'm just the messenger

Consider a method you might see in any Java application (I oh-so-love the jvm, so I get to pick on Java), but the same sort of thing applies in C, C++, C#, python, ruby, perl, et al.:

public void doSomething (String arg1, int arg2, FooBar arg3) throws IOException;

Simple enough, right? Hey, we're programming, life is good. But, what if you saw a signature like this:

public void doSomething (String arg1, int arg2, FooBar arg3, .....,
                         String arg316) throws IOException;

316 arguments to a method (which I don't think is actually possible in the jvm, but bear with me)? "That's absurd!", you'd say. The problem, of course, is that the 3-arg doSomething actually has far more arguments than its signature implies:

The behaviour of every function in a mutable, imperative environment is dependent upon the state of all of the other (variables|attributes|bindings|whatever) in your program at the time the function is invoked.

So, if you have 313 other variables in your program, that 3-arg doSomething is functionally (ha!) operating over 316 arguments.

Would you ever intentionally write a method signature that takes 316 arguments? Would you use any library that contained such a function signature? No? Then why are you using tools that force such craziness upon you?

Postscript

Of course, there is a place for mutable, imperative programming. The fellow who wrote the blog post to which I linked above appears to work on games, one of the few places where one could unapologetically use an imperative programming language with mutable state. Update: Looks like the state-of-the-art in game programming is heading towards FP languages more than I thought. Thanks to this comment, here's a LtU thread, with slides, about the guys who wrote Gears of War and the Unreal engine recommending FP as the future of game development.

However, we need to collectively get past encouraging other software developers – the vast majority of whom do not have the particular requirements of game, systems, or embedded development – to inflict the pain of imperative languages and mutable state upon themselves, especially given the concurrency challenges that lie ahead (never mind the general problems such environments present, as I argue above). The languages are ready, the runtimes are widespread...let's stop doing it wrong.

Posted at 30 Dec @ 7:00 AM by user Chas Emerick | comment 26 comments
Founder, Snowtide Informatics

About Me

I'm the founder of Snowtide Informatics. We make DocuHarvest, a web application that turns your valuable documents into data, and PDFTextStream, a PDF text extraction library for Java and .NET. I do a lot of programming in Clojure and just a little in Java, trying to make it easier for people to make unstructured content just a little more useful.

    Topics

    Archives

    1. 2010
      1. July
      2. June
      3. May
      4. April
      5. March
      6. February
      7. January
    2. 2009
      1. December
      2. November
      3. October
      4. September
      5. April
      6. March
      7. February
      8. January
    3. 2008
      1. November
      2. July
      3. May
      4. March
    4. 2007
      1. November
      2. October
      3. April
      4. March
      5. February
    5. 2006
      1. December
      2. October
      3. September
      4. August
      5. January
    6. 2005
      1. September
      2. August
      3. July
      4. June
      5. January
    7. 2004
      1. December
      2. September
    Adaptavist Theme Builder (3.3.5-conf210) Powered by Atlassian Confluence 3.0.2, the Enterprise Wiki.