In a past life as a "software engineer" on contract, a favorite analogy of coworkers was to compare software development to construction (perhaps influenced by the early 2000's housing boom?). Projects were like houses, plans were needed to properly architect the end result, and schedules were laid down to ensure that the "foundations" were done before we started "framing" the building and "furnishing" the details. Conceptualizing software development in such a way is common, and there's a long history of people involved in what has casually been called "software engineering" thinking that what they do is, or should be, related to "old-world" engineering.
This is largely both nonsense and decidedly unnerving.
I'm not like my grandfathers
Both of my grandfathers were involved in engineering; knowing something of what they did makes me even more sure that what I do is not related to engineering.
One was an electrician, more a tradesman than an engineer, who worked in the construction of large commercial buildings in downtown Hartford and Denver. Each day, he would look at the blueprints painstakingly drafted by the project's architects and engineers, and go about making those plans a reality – installing high-voltage switching equipment, stringing miles of Romex (or, whatever they used back then), and doing all of it in hazardous conditions.
My other grandfather was, as far as I remember, something of a process engineer at a subsidiary of Olin, helping to design the manufacturing processes that would heat, pound, roll, cut, stamp, and test thousands of varieties of copper and stainless steel foils, strips, and other formulations for later inclusion in all sorts of products, both industrial- and consumer-related.
These men's careers were very different, but they were involved in what are clearly engineering tasks.
An art's constraints are what define the art
There's a lot that separates my discipline from my grandfathers', but I think the most significant is that, as someone who builds software, I have far more discretion in how I achieve my end results than they had in their work. The degree to which this is the case cannot be overstated, but I'm at a loss for words as to how to concisely characterize it. Materials in the real world behave in ways that are, in the modern age anyway, understood: electricity and copper and steel and wood have known physical characteristics and respond in known ways to all of the forces that might be applied to them.
1
In contrast, the world of software has so many degrees of freedom in so many vectors, the possibilities are functionally limitless. This is both a blessing and a curse, as it means that the programmer is something of a god within her domain, free to redefine the its fundamental laws at will. Given this context, it's no wonder that software projects fail at an astounding rate: we simply have nothing akin to the known natural constraints that are conveniently provided for our real-world engineer friends, so we have no option but to discover those constraints as we go along.
The software community's response to this has been to erect artificial constraints in an effort to make it possible to get things done without simply going insane: machine memory models with defined semantics, managed allocation, garbage collection, object models, concurrency models, type systems, static analysis, frameworks, best practices, software development methodologies for all stripes and inclinations. This is natural, and good, and the only way we could possibly make sense of this thing called software given how young it is.
Yet, even after all this edifice, ask 100 software developers how to build a website, and you will get at least 500 answers – and the people that respond with a hesitant "It depends" are likely the most clueful of the group. If my grandfather had responded "It depends" to a question about how to produce a 1-ton spool of 2mm-thick copper strip, he'd have been fired on the spot.
2
Susur's work is remarkably intricate and exhibits a delicate precision unmatched by his peers. He prefers the crystalline perfection of a perfectly-balanced type system, and so chooses Haskell for most of his work.
Living up to his "Obi-Wan" moniker, Jonathan makes the most familiar things amazing, and people are never sure how. His secret is that he usually works in some lisp or scheme, which he then uses to generate whatever comforting form his customers prefer, often Java or C#.
Marcus likes to surprise people with the most esoteric things possible: his pièce de résistance is written using his own prolog variant, implemented in Factor. People love it either for the ballsiness of it all, or because his stuff works so well they don't notice.
Rick is a madman, insisting on using C++ for everything, even web applications.
Okay, so software development isn't engineering. It's probably safe to say it's a craft, though (in contrast to pure art like painting or sculpting, but that's a different blog post). In particular, its similarities to cooking are legion, something that I noticed while watching one of the few bits of television I bother with (and a guilty pleasure), Top Chef (though I redeem myself oh-so-slightly by preferring the Masters incarnation by a mile). *blush*
Watching this show with an eye towards the methods and attitudes of the chefs is like watching a group of programmers struggle to build quality software. Functionally no constraints in process, methodology, and materials? Check. Infinitely many ways to get the job done depending on the skill of the craftsman, the tastes of the customers, and the specifics of the materials? Check. Identiying the best craftsmen is difficult, and most of those at the top of their field have a murky combination of ill-defined qualities? Check. A wide disparity between the effectiveness of individuals in the field? Check. At the edges, people experiment with hacks that sometimes come to be part of everyone's repertoire? Check. Technical capability is not a necessary requirement for success, due to fluctuating trends (less charitably called "fashion") and a variety of potentially-compensating personal characteristics? Check. Less mature (and often less capable) members of the community are given to snotty ego trips, bad attitudes, and frequent tantrums? Check.
Given all of the similarities, I think the fact that it's difficult to assess the quality of individuals in either field is the most striking (and perhaps a key indicator that a field has not (or intrinsically cannot) internalized a certain minimum degree of rigor in its methods). It's telling that TopCoder predated Top Chef by a decade. Great hackers and those guys that can hit the high notes are few and far between, mixed in with scads of cooks that come to work stoned and "programmers" that put HTML on their resumés that still manage to get hired, somewhere. These are fields that are far from science, far from engineering, and well within the lush, rolling hills of craft.
Enough already. So what?
First, words matter and it's important to call a spade a spade. Thoughtlessly (or disingenuously) call software development "engineering", and people walk off with all sorts of notions that are inappropriate. This is particularly damning with customers and other nontechnical "civilians".
Second, craft, as revered a concept as it is, is not desirable when businesses, livelihoods, and actual lives are at stake. I've never worked on aeronautics software or the like, but despite its codified standards, its rigorous testing protocols, and access to millions and billions of dollars in resources, we've crashed satellites into planets because of something as triflingly simple as conversion between metric and standard measures. Similarexamplesareeverywhere. We need software development to be built with an engineering discipline, because everything we do depends upon it.
Of course, I have no tidy answers for that challenge. I think that if we can pull ourselves out of this primordial ooze of twiddling bits and get to a point where we describe how to do things relevant to the domains we're working in, then I think there's a chance for the species to survive. Ideally, domain experts should be doing the bulk of the "programming" work, given that communication between such experts and programmers is probably the source of at least half, if not the vast majority of software design flaws. This is an old chestnut, dating back to the 4GL programming languages (now 5GL?), declarative programming, expert systems, and so on.
Programmers' work will be done when we've put ourselves out of a job
One strain of real-life examples – such as Pipes, DabbleDB, and WuFoo – allow people to collect and process data without touching a line of code. The best specimens of these kinds of systems are often labeled with the "tool" slur by programmers, but the fact is such things allow for greater aggregate productivity and less possibility for error than any similar purpose-built application. Meta-software that can deliver the same for any given domain is the future, and the result will be that programmers, as they are known today, should cease to exist in commercial settings.
The craft of programming will survive though, and continue to be the source of innovation. After all, there's no shortage of demand for great chefs, even with all these McDonald's and Olive Gardens dotting the landscape.
Having worked primarily with PDF documents and all the minutiae of their fonts and such over the years, I've come to have a great appreciation for typography. This appreciation has led me down some interesting paths, most notably when I visited the Wilson Printing Office a few years ago, originally built in 1816 in what is now Old Deerfield Village (about a half hour's drive north). It's a quaint old building in a quaint old village, exactly what you'd expect in New England:
Inside, you can find a very old, manually-operated, movable-type letterpress. It's entirely functional, and luckily, visitors are allowed to operate the behemoth.
I was reminded of this recently when I stumbled across this video, where the proprietor of Firefly Press (located in Somerville, MA) talks about his love of letterpress, the state of his craft, and how he expects it to die eventually, simply because people will forget how to do it:
The artfulness, the care, and the precision of the work exhibited there is remarkable. Seeing it makes me want to retire and build a letterpress from scratch, and start pumping out lovingly-crafted stationary and such (although remember, self-sufficiency is the road to poverty).
As Sisyphean as it might seem, I try to bring as much of that spirit as I can to what I do. Despite the sometimes soul-sucking pop culture of software development, the drumbeat of get-it-done-fast that comes on every vector, and the never-ending treadmill of "new" technologies that parade across social news sites, I try to bring a craft to the code I write, the systems I build, and the experiences I assemble for my customers.
I'm heartened that it seems that I'm not alone in this. There are many like me that seem to have re-discovered what's important and relevant to building sustainable systems – and discovered that, yes, it's possible to keep that separate from the trendy, the immediate requirements, the moment's conveniences. Computation, after all, doesn't appear to change much. Lambdas and pointers are likely to be there, waiting for future generations just as they serve us today – modulo some (hopefully slight) packaging that helps with interacting with the broader world.
(It appears that this perspective may be necessary [though surely not sufficient!] in order to build successful, and not merely elegant systems that solve today's and tomorrow's pressing problems. No one will cheer much anymore when an application is delivered that is largely built while sitting on one hand [a Spolkyism there, I believe, referring to IDE wizards and such].)
Something said in that video really rung out to me, reminding me of the traditions of LISP that exist, and where I happen to stand in relation to them:
The old guys got it remarkably right.
Of course, the old vanguards have faded for the most part; many have turned to Clojure and other modern lisps. That sense of craft and the original intent and spirit of "the old guys'" work is there and alive.
A quick search for 'letterpress' uncovers a host of shops, plying their craft, making beautiful things out of cloth and cotton and wood and steel. May that continue to be the case 100 years hence, too.
Getting valuable data out of documents should not require an I.T. staff, outside consultants, building or buying software, or an up-front investment of hundreds or thousands of dollars, regardless of how many documents and how much data is involved.
This may seem strange to hear coming from me: you may or may not know that I've been principally involved in selling PDF content extraction software for the past six years. Over that time, I've had the opportunity to come face-to-face with hundreds of content and data extraction challenges across dozens of industries. If there's one takeaway I can offer up from that experience, it's this:
No one cares about the process of data extraction: people only care about their data.
Seems simple enough, but ask anyone who's been involved in any kind of data integration project, or tried to help a nontechnical user get useful data out of a directory full of documents, and you'll know that people are forced to care. The situation is worse for e.g. small business owners and others that simply can't afford additional software and the attendant consulting hours.
DocuHarvest is an alternative path: a web application that provides data extraction and content conversion services through the browser, usable by everyone, costing pennies per document processed. There's even a free option, if you're willing to process only one document at a time.
Available Now
We're starting small, offering three types of document processing jobs:
DocuHarvest currently only accepts PDF documents as input, but that will change relatively soon – PDF just happens to be where we "come from", so we're rolling that out first. Support for additional file formats will come.
In addition, we have a variety of additional types of jobs in the pipeline, including:
conversion of documents to images (rasterization),
extraction of embedded images, and
thumbnail generation
That's hardly a comprehensive list. This is just the beginning; we have a lot of tricks we've saved up over the course of those six years. :-)
If you have any feedback, comments, questions, suggestions, or complaints, don't hesitate to contact me; leave a comment below or in the feedback boxes on the DocuHarvest site, message me (@cemerick) or @docuharvest on Twitter, or email me directly.
In my last post, I solicited the Clojure community to participate in a short survey to determine a few things in particular:
Which language/community have Clojure programmers "come from", or are primarily using now, if not Clojure?
In which domain(s) is Clojure being used?
To what extent is Clojure being used commercially?
What are Clojure's biggest weaknesses at this point in its development?
I tossed in a few other questions as well, but determining the above was my primary motivation.
I'm going to run through some highlights and my key takeaways from the data that was gathered, along with some potential TODO items that the Clojure community might want to focus on over the coming months. See the link at the end to get to the raw data to satisfy all your statistical urges.
Responses and Context
The survey accumulated 487 responses, which I think is a hefty sampling (darn close to a typical political poll, for what that's worth!), and a very strong turnout given the brief time the survey was held open (roughly three days) and the meager promotional efforts that were made (two posts to the main Cloure mailing list, and various people twittering). Note that results were not available while the survey was open – my attempt at limiting any incentives to respond multiple times in an attempt to game the results.
In any case, I'd say that the results are solidly representative of the Clojure community (though I'm hardly a professional when it comes to surveying, polling, or data collection in general, so I'm sure there are plenty more issues with the "methodology" used here). Some caveats warrant mention though. The nature of the survey and its promotion ensured that only the most "connected" Clojure programmers would have been aware of it; I suspect there's an entire class of potential respondents that simply don't frequent the mailing list or bother with Twitter. Further, there's certainly some self-selection bias going on – people who are less enthusiastic about Clojure would be less enthusiastic about spending time answering the survey's questions.
With that out of the way, let's get on with it.
Highlights and Summary
What follows is a question-by-question summary, starting with the quantifiable stuff first. I'm also adding in my interpretation and thoughts, especially where the data indicates a weakness in the language, community, or other areas. I'm recreating the charts here because the charts in Google's "summary" view seem to be falling down, especially when it comes to the language-related questions that had a few dozen options.
How long have you been using Clojure?
It seems clear that the Clojure community is growing, and growing fast. I'll leave it to others to speculate on a specific growth rate, but it would appear that the curve is tilting far past 2x in Clojure's third year. No shocker there.
How would you characterize your use of Clojure today?
The only thing I can add here is that I'm surprised that Clojure isn't being used more in academic settings, at least in relative terms. Students becoming familiar with Clojure in school is the tip of the spear when it comes to Clojure being used more widely in commercial settings, so it might be worthwhile to think about what might specifically address students' and professors' academic requirements. Some of the comments further on about usability and ease-of-setup might be applicable.
What is the status of Clojure in your workplace?
These results make me quite happy: more than half of all respondents, 53%, are using Clojure at work or are lobbying to be able to do so. Clojure and its various libraries make for a decidedly "serious" development environment, and people haven't been shy about introducing it into their workplace. These results are remarkable for a language that's so young, and perhaps confirms that the historical prejudice against lisps is waning somewhat (which might turn out to be a demographic/generational issue from here on out, or so we can hope).
In which domain(s) are you using Clojure?
Random thoughts:
I was actually surprised that web development is the big winner here. I suppose no matter what else one does, web development is necessary at some point – and the existence of frameworks and libraries like Compojure/Ring, Enlive, and compojure-rest certainly make web development with Clojure a joy.
I knew people were doing a lot of maths with Clojure, but...wow! This is probably mostly attributable to David Liebke's excellent Incanter library.
The fact that RDBMS usage is trailing usage of non-relational data stores (I try to avoid using the cringe-worthy "NoSQL" moniker as much as possible) is interesting. I take that as an indication that Clojure might be getting more use in new/green-field projects vs. being integrated into existing (call 'em "legacy"! ) projects, which certainly wouldn't be surprising if true.
A quick scan of people's "other" domains doesn't reveal any significant domains that I missed in the main list. There were 2-3 mentions of "music" here and there, which is interesting.
Which environment(s) do you use to work with Clojure?
Note that respondents could choose more than one option here, so results add up to more than 100%.
Clojure development tools has been a favorite hobby horse of mine for some time (see my scratchpad about The Ideal Clojure Development Environment), and those that know my preferences can bet I'm eating my hat on this one.
Emacs ran away with it, actually moreso than I expected, in use by 70% all respondents.
I'm really surprised by the number of people using plain vanilla command-line REPLs.
vi also has a larger share than I expected, as it's certainly not traditionally considered a good lisp editor (at least as far as I'm aware). It even outpaced usage of each of the Java IDE plugins.
Eclipse + Counterclockwise is the winner in the IDE category, followed by NetBeans + Enclojure and then IntelliJ + La Clojure.
In the "other" category, there were a few mentions of TextMate (!) , as well as mentions of Maven (presumably via the clojure-maven-plugin) and Leiningen.
I continue to maintain that broad acceptance and usage of Clojure will require that there be top-notch development environments for it that mere mortals can use and not be intimidated by...and IMO, while emacs is hugely capable, I think it falls down badly on a number of counts related to usability, community/ecosystem, and interoperability. But, I'm not here to harsh on emacs (at the moment! ). Let's get our ducks in a row:
To restate: broad acceptance and usage of Clojure will require that there be top-notch development environments for it that mere mortals can use and not be intimidated by.
The various IDE plugins are simply being outpaced by the emacs world in terms of raw editing capability. There's plenty of reasons to believe that this will continue to be the case (emacs has a bit of a head start in that department!), but I assume the plugins continue to improve in this area.
It is possible that the IDE plugins are in an acceptable state right now, and that emacs is simply a more natural/acceptable environment to Clojure early adopters for a variety of reasons. Even if the latter is true, this is a dodge: there is still a wide editing capability gap that must be narrowed at the very least, and as the respondents' comments (highlighted below) show, the IDE plugins are not up to par.
Finally, there may be a matter of timing, or simply of time passing. The longer Clojure goes without an "accepted" "mainstream" development environment (super-big scary scare quotes there!), the more people who newly discover Clojure for the first time get the impression that emacs is the only game in town. And, when it comes to first impressions like that, it can take years for new developments to percolate back into people's understanding. So, making this situation better – so that no matter who you are or where you come from or what your personal preferences are, there's a Clojure development environment for you – should be a top priority for the community, insofar as broad adoption is a priority. If you're interested in helping, get started with the existing Clojure environments, and then go talk to Eric and Laurent and find out what they need.
Clojure is primarily a JVM-hosted language. Which other platform(s) would you be interested in using Clojure on, given a mature implementation?
"Other" is the big winner here, almost garnering a majority. The most commonly-noted "other" host/target was LLVM, followed by other mentions of "native", "portable C", and Parrot. I actually intentionally didn't list LLVM/C; the nature of Clojure as a hosted language means that a lot of its core functionality (threading and concurrency primitives, GC, networking, standard library stuffs, etc, etc) comes from the host. Emitting C from Clojure source is probably a perfectly reasonable thing to do, except I'm not sure where all the great facilities that people associate with Clojure would come from; you'd have to have some standard libraries that cover a fair bit of of what Java and the JVM provides in order to have a reasonable native Clojure target. I would think that targeting a Scheme (Gambit, perhaps) that emits C yet has a rich standard library would be far, far easier. Perhaps there's a wrinkle or easy out in this department that I'm not aware of.
(A full and proper Javascript Clojure implementation probably isn't possible either, given the limited/crippled execution environment Javascript usually finds itself in. Given that, perhaps my biases were evident in the options I listed here.)
Another note about the "other" responses is that the need for "quick startup", sometimes in connection with desktop application deployment, was mentioned a number of times. Perhaps this issue will be alleviated with the quicker JVM startup times promised for JDK 7. Tighter integration with nailgun could also help there.
What language did you use just prior to adopting Clojure – or, if Clojure is not your primary language now, what is that primary language?
I'm only showing options that garnered more than 1% of responses; this was not a multiple-choice question.
Java, Ruby, and Python have the largest representation here, with Java's share double that of Ruby, the next most common response. This seems exactly right to me, based intuitionally on the time I've spent in #clojure and on the mailing list.
Interestingly, very few people have come directly from Common Lisp and Scheme. Those folks are presumably happy with their current environment.
I think it's safe to say that Clojure is a cultural melting pot – the disparities in attitude, priorities, and domains among Java, Ruby, and Python developers (never mind you Ada expatiates!) can be vast. As the community grows, ensuring that the Clojure culture remains distinct, integrated, focused, and friendly will likely continue to be an important challenge. This is something that everyone can do something about:
help your neighbor in #clojure and on the mailing list, especially those new to Clojure
remember that it's possible to disagree without being disagreeable
share and share alike, through contributions to Clojure open source projects, by sharing news of your successful usage of Clojure, and by offering up experience you've earned in your other/previous domains and environments
If Clojure disappeared tomorrow, what language(s) might you use as a "replacement"?
Again, only showing responses that cracked 2% of responses; respondents could choose multiple languages.
There's a lot of really interesting bits here:
The big winner here is functional programming. Erlang, F#, Haskell, and Scala are all well-represented.
Not many people are interested in using Java without Clojure, at least relative to the number that came to Clojure from Java.
Big numbers for Common Lisp and Scheme, even though very few people came to Clojure from those languages. Clearly, Clojure has (re-)introduced lisp to a lot of people, and they like what they see. Those lispers that think Clojure is "evil" (especially in the CL community – and yup, I've heard exactly that term used, though I won't bother helping their authors with a link) should pay attention. You're welcome.
What do you think is Clojure's most glaring weakness / blind spot / problem?
Time for us to take our medicine. This question took textual responses, so I'll leave it to others to come up with some categorization of the data. I encourage you to go look at the full set of data and read through the answers to this question, especially if you lead, are involved, or help out with a Clojure project. There are likely tidbits there that you need to pay attention to.
Here are the most common issues I see from scanning the results, which I've tried to list in order of their prevalence:
Poor / incomprehensible error messages and stack traces are far and away the most common complaint.
Documentation is an issue for many – lack of clarity ("A lot of documentation assumes you are coming from either a Lisp or Java background, which isn't the case for some of us."), needing to go multiple places for docs, out of date screencasts, etc. This is a hard problem, relative to other issues listed here.
Getting started is a sore point – going from zero to having a fully-working environment with libraries, build tools, documentation, et al. properly set up is a nontrivial task for most
Related to that is lots of sentiment that better non-emacs IDE support is needed ("Richer support for non-emacs IDEs", "A good IDE that's EASY TO SETUP!", "My company would easily pay several hundred dollars a seat for a hassle-free powerful IDE for Clojure", "learning emacs and slime seems like a requirement", etc), as discussed ad nauseum above. While there's a lot of short, one-off comments about the failings of error messages and documentation, there are a number of (sometimes amusing) rants on the topic of development environments and editors. If you care about this issue, read them.
Some complain of a "lack of comprehensive Windows support"; I'm not sure what's lacking on Windows, but the community is decidedly linux/unix/Mac biased, so I'm not entirely surprised that this is mentioned.
slow startup time / difficult to use for scripting
Points where Java juts up into Clojure where people would prefer otherwise is a common stumbling block (e.g. "[You end up needing] to know when you're dipping into the Java world or not. Some of this rift is explicit, like when calling a Java method, but others are not. Protocols is a good example of this."). With this I'd group various comments about classpath management and wanting TCO, continuations, and other features that would be host-provided, but which the JVM doesn't provide.
Constant flux in APIs is a difficulty. Related to that, a proposal: "How about stopping at 1.2 for a bit and letting everyone sync to it so the supporting tools and libraries can get stable."
Build tool diversity is an unwanted complication for most, with the main tension between Maven and lein. I personally think Clojure Polyglot Maven is the unified "compromise" path that, barring some other stellar solution or Sonatype gumming things up badly, should probably end up being the standard.
OSGi support/compatibility was mentioned a number of times.
There were a number of interesting solitary responses that I'll just quote directly, for fear of mischaracterizing them:
"I worry that Clojure is becoming too popular too fast and is in imminent danger of being 'enterprised,' so to speak, by people whose gifts are limited to the gift of gab and the gift of garb."
"The "EAI" enterprisey world at large uses Eclipse. I would say the biggest weakness for Clojure is really its biggest opportunity — make a completely badass mega-awesome Eclipse plugin for Clojure development. Counterclockwise is kind of a joke. Give it better code-completion, documentation lookup, trans-code navigation, etc. Give it a feature that lets developers easily create & "sneak in" Clojure-based code modules into existing, large EAI projects. This is how we can get Clojure's foot in the door. We can tell our pointy-haired bosses 'oh, it's just a java library that helps increase performance' – yes, the Clojure language permits this, but it would REALLY help if the tools promoted it."
"Too much emphasis on Concurrency and not enough on solving everyday problems as a general purpose language. The top of the clojure community needs to start putting out real code written to access DBs, service web requests, do scatter gather integration etc. In short show how to solve the every day coding most Java (or JVM) software engineers face. If I see one more graph theory, or factorial, or primes example I think I shall puke, quit, and just stick with Groovy (which has piles piles of code solving every day problems)."
Again, go look at the raw response data, and check out what people have to say. You may not agree with what's said, especially if you are a happy/satisfied Clojure programmer, but feedback like this is gold: criticism from people that are sticking it out with Clojure, despite the warts they feel it has.
General Comments?
OK, now that we've gotten the criticism out of the way, here's a smattering of the general comments people offered in the last question. Virtually all of it was positive, probably because I gave people a dedicated text area to complain in! Good things to hear:
We're in production with it since Jan 2009. Great language, great robustness and very sound concepts.
Life is good
clojure is awesome!
Clojure is putting life back [into] LISP.
Clojure is the best language on the JVM. Period. It currently lacks tool support and exhaustive documentation, but I hope that will come along as time passes. I'd also like to see some evangelism for use of Clojure in the enterprise.
I haven't been this excited about a new programming language in a long time
Love the community!
Clojure is awesome. It's like Lisp is alive again.
The Joy of Clojure is amazing. And Mike Fogus is very handsome.
Clojure is a lot of fun. I'd like to keep that as the language grows.
Clojure's JVM interop seems far superior to anything else, and its hard to see another language being such a natural fit for us and our problem domain.
Clojure made programming fun again.
Great job Rich and the Clojure community.
I've never been more excited about a new language.
Writing in Clojure makes me happy.
<3
Clojure is the best thing that's happened to programming since, uh, anything.
Clojure is making me fall in love with programming all over again
I have learnt a lot more by asking questions on the IRC then anywhere else...You guys are awesome.
Using Clojure makes me a better programmer.
go, go, go
I think Clojure is here to stay. I tried Scala for a while. No comparison. Tying a lisp like language to the JVM is genius.
Clojure is the first Lisp-alike I've encountered that I like instead of merely admire. Rich Hickey has performed a miracle in my books.
I like nachos. And Rich has fantastic hair.
And finally:
keep up the good work trying to keep clojure community organized. It's a thankless job. So thank you!
I'm not sure if I'm doing much more than gathering some useful data, all of which is from the great people in the community. So, no, thank you! :-D
Organizing the survey and this post has been a blast for me; I hope you enjoyed it as well. I presume I'll follow up and do something similar again; State of Clojure, Winter 2010, perhaps?
I have now been using Clojure as my primary programming language for almost exactly two years. Clojure 1.2 is nearing release. The Clojure community is larger than it ever has been, and shows no sign of slackening its growth.
It seems like now would be a good time to take stock of where the community is, how people came to use Clojure, and how it's being used in the world. To do that, I put together a quick, 9-question survey through Google Spreadsheets, embedded below.
Hopefully enough responses will come through that we'll be able to get a good picture of the current state of affairs, and maybe a little insight into where Clojure can and should make headway in the future.
The results from this survey are now available, along with my oh-so-enlightened interpretations. Enjoy!
As I briefly mentioned in my last post, I've been working with Pallet to enable automated administration of, among other things, CouchDB. If you're wondering why I'm using Pallet instead of, say, Puppet or Chef, you can either read the "Why Write Another Tool?" section in Hugo Duncan's recent post on Pallet. My answer to that question is that I wanted a tool that would provide automated:
Provisioning,
Administration & configuration, and
Application deployment
...all in one piece of kit that would neatly interoperate with the rest of our development stack (JVM, Clojure, Maven, Hudson, etc., etc). Pallet is the only option I found that thread that needle.
From bare metal to ready-for-production app deployment in 5 minutes or 5 paragraphs...
Using Pallet, we can automate everything necessary to provision and configure the resources needed to run our application. The following code defines, spins up, and configures an EC2 node; the steps listed below correspond almost exactly with each line of the defnode configuration that forms the majority of the code:
Use a specific Ubuntu AMI on a particular instance size
Use a standard firewall / security group configuration
Configure an "admin user" with a specific username that has only one authorized key (mine).
Tweak apt so that it's "sane". <snark>I like being able to install useful software, so multiverse it is.</snark>
Install the Sun JDK
Install the Tomcat application server
Install CouchDB and set two properties in its local.ini file (one to disable the javascript view server reduce limit – don't ape that if you don't know what you're doing – and one to change its default storage location to a different directory).
Create the aforementioned CouchDB storage directory.
Deploy our application as the ROOT application in tomcat and restart it (I've omitted the part that sets security policy in the same block, which is what actually necessitates the app server restart).
(I've simplified certain things in this rendition, but what I've elided are details that are pretty esoteric and/or miscellaneous – i.e. installing unlimited-strength crypto policy files in the installed JDK, setting VM parameters for Tomcat, etc.)
(Note that jcompute is an alias for the compute namespace provided by the excellent jclouds library, which Pallet uses for cloud-agnostic infrastructure provisioning as well as cloud-specific stuff, like EBS volume and snapshot management, elastic IP management, etc.)
Want to spin up 10 nodes instead of one? Change {master 1} to {master 10}. Other changes are similarly straightforward. Want to deploy an application update to existing nodes instead of creating new nodes? Instead of using converge, execute (pallet.core/lift master :deploy).
There's obviously a lot going on behind the scenes, but this is what the day-to-day configuration and usage of Pallet looks like. Using it means that I never have to use a command line or fiddly manual AWS tooling like their console or ElasticFox, or cobble together some combination of Chef/Puppet with Capistrano/Fabric and a pile of shell scripts to get a complete provision/configure/deploy solution.
Huge thanks to Hugo (who let me play in his sandbox ) and Adrian Cole (the crazy man behind jclouds) for making this all possible.
I ran into a couple of administration issues with CouchDB while working on support for it in the excellent Pallet project
4
, so I thought I'd leave some breadcrumbs for those that follow.
(Note that these issues were experienced with CouchDB 0.10.0 on Ubuntu Karmic. They may be resolved in later versions of CouchDB or Ubuntu, but those are the versions we're targeting for now.)
Broken Directory Permissions
First, Karmic's couchdb package is broken, insofar as key directories that CouchDB uses don't have the right ownership or mode. The symptom of this is that CouchDB will not stop properly when one invokes /etc/init.d/couchdb stop. This is a known issue, and will hopefully be resolved for Ubuntu Lucid. Rumor has it that some versions of CentOS have the same issue.
That's a bit of a carpet-bombing, but certainly won't do any harm, and does the trick (adjust for the install dir you have, e.g. perhaps prefixing everything with /usr/local).
CouchDB only detaches when started from a full shell
This is where the world will learn that I'm mostly an idiot when it comes to shell stuff and sysadmin in general. Thanks go to Hugo Duncan for giving me a key hint that allowed to get past this one.
In short, pallet was doing the equivalent of this in order to invoke the scripts it generates for configuration management, etc. (assuming here that your user has NOPASSWD in /etc/sudoers:
So, we're allocating a tty, which many services need around in order to fork and detach properly (such as Tomcat via jsvc, for example). However, the CouchDB server that is started with this command dies along with the ssh session. Go ahead, give it a shot. If you really want proof, you can do this to see that the server is running before the session is closed out:
Of course, if you log into an environment with a full interactive session, starting CouchDB and then logging out will leave the server running as one would expect.
The solution is painfully simple in this case – just don't invoke /etc/init.d/couchdb start as an ssh exec command. Whatever you're using for configuration management, have it run in a full interactive shell session. That's exactly what Pallet is now doing for all of its configuration executions.
If you have to pick, choose function over form (at least when it comes to build tools).
Ahem. Sorry, let's start from the beginning.
Like any group of super-smart programmers using a relatively new language, a lot of folks in the Clojure community have looked at existing build tools (the JVM space is the relevant one here, meaning primarily Maven and Ant, although someone will bark if I don't mention Gradle, too), and felt a rush of disdain. I'd speculate that this came mostly because of XML allergies, but perhaps also in part because when one has a hammer as glorious as Clojure, it's hard to not want to use it to beat away at every problem in sight. Ruby has rake, and python has easy_install, so it seems natural that Clojure should have its own build system that leverages the language's stellar capabilities – "just think of how simple builds could be given macros and such", one might think.
I can sympathize with that perspective, and I admit that I, too, once thought that a Clojure-based build system was an obvious move. This notion runs off the rails pretty quickly for one reason:
You can either help reimplement all of these things – or, if you're lucky enough to have access to a build tool that has a community that has built all these things already, you can use that.
Handily enough, Clojure is a JVM language, so using all of the goodness that's been built up over the years in Maven-land is extraordinarily easy to do. This means you have to write less code, and you get to use more mature, well-tested, well-supported code and tools, allowing you to focus on building awesome Clojure apps, not dicking around with implementing shell invocation, or Java compilation, or deployment via scp, or whatever "simple" build task you need today that's been in Maven's quiver for 5 years.
As if that weren't enough, Sonatype has its Polyglot Maven project, where they are working on making it possible to drive Maven builds from your favorite language, be it Clojure, Ruby, Groovy, or Scala. For now, I stick to using XML POM files (they're incredibly well-supported by tons of JVM-land tools – code completion on dependency version numbers FTW); while I love s-expressions, I'm too happy to trade off a pinch of syntactic elegance in exchange for tons more capability.
If you're going to use Maven for your Clojure builds, here's some links:
Please make sure you check out the documentation on clojure-maven-plugin, which is where all of the Clojure-specific goals come from.
You'll do yourself a world of good by keeping the Maven books ready at hand (not the old one published years ago, BTW, the newer ones available online or throughlulu). Yup, there's a lot of material there. No, you don't need to know it all to become super-productive with Maven.
We're building a web service for which we aim to charge money. Further, the data being pushed around may be confidential or otherwise of a sensitive nature. We have good reasons to do everything we can to ensure that the service is secured "properly":
We don't want to have customers charged for work that is requested by a bad actor exploiting a security hole (of course, we'd issue a refund and an apology in such a case, but the impact to our business through unnecessary processing could be sizable).
We don't want our customers' data exposed; common vectors for this include sniffing, replay attacks, or simply the use of compromised credentials.
Of course, the impact on our relationship with our customers due to any security breach could be significant and devastating – to our business, our reputation, and potentially even to our customers' affairs completely outside of their use of our web service. So again, we have a lot of reasons to be highly-motivated when it comes to security.
By way of context, let's set the stage with regard to the moving pieces. The web service in question:
is built on a JVM stack (with the application itself built with Clojure, of course, using the Compojure framework)
has a user-facing, HTML browser interface as well as a "RESTful" API surface ("RESTful", as in, pretty darn close to ROA "style", so the set of URIs involved in delivering the user-facing interface vs. those delivering the REST API are nearly identical).
the user-facing interface offers standard form-based authentication, as well as OpenID authentication (which will be recommended only for more casual users and usage).
will always, always delivered over SSL. We assume that every bit of data transferred is confidential, so cleartext is an absolute no-no.
OK, let's go find an expert
It is with this mindset that I've been digging into how to approach web service security. Note that I'm no specialist or expert in this area – I'm merely a practitioner that is usually focused on things far, far away from anything security-related. (It may not surprise you that I'm coming to appreciate that fact more and more as I learn about the "state of the art" in web service security.)
Given this, I set out a few weeks ago to see where things stand on the web service security front. Of course, that realm is just as full of cliques and posturing and strawmen and ad hominem attacks as the broader software development world is, so finding a clear path forward is not easy. First, a bit of literature review, as it were, drawn in particular from a flurry of web service security chatter a few years ago (emphasis here and there is mine, I wish I had noticed and grokked the indicated bits earlier, I'll explain below):
I started by finding Gunnar Peterson's pair of posts where he compares "REST security" with WS-Security stuffs, where the former (especially approaches like HTTP Basic authentication over SSL) come out sounding like a pretty bad choice:
people who say REST is simpler than SOAP with WS-Security conveniently ignore things like, oh message level security
Now if you are at all serious about putting some security mechanisms in to your REST there are some good examples [such as Amazon's implementation of an HMAC authentication scheme].
Some people in the REST community are able to see the need for message level security so this is heartening somewhat. If the data is distributed and the security model is point to point (at best), we have a problem.
In summary, RESTful security, that is SSL and HTTP Basic/Digest, provides a stable and mature solution that addresses transport level credential passing, encryption, and integrity. It is ubiquitous, simple, and interoperable. It requires no out-of-band contract negotiation or a priori knowledge of how the resource (okay, service) is secured. It leverages your existing security infrastructure and expertise. And it addresses 99% of the use cases you are likely to encounter. SSL does not support message level security, and if that’s a requirement, then leveraging SOAP and WSS makes sense.
I am no way suggesting there is only way to do this or that WS-Security came down on stone tablets. I am also not suggesting that a NSA level of security is appropriate for Google Maps. There are many shades of gray. “good enough” security is a big challenge, and it isnt about black and white security models, it is about risk management
From Bill de hÓra:
I think this is where quantative analysis comes in and a measured assessement of the risk is taken. What has to be protected and what’s the worthwhile cost of doing so? Being software people, that’s beyond the general state of the art. We do gut feelings, flames and opinions.
There's a variety of "REST security 'best practices'" posts out there, but a question from StackOverflow links to a variety of additional discussions there that serve as good an indication as any that the accepted way of securing REST web services is Basic auth over SSL.
And now for a bit of hyperbole Before moving on, I just want to point out that Bill de hÓra's comment above is sadly representative of so many corners of software development. Let's ponder that for a moment, while realizing that modern society and its continuation absolutely depends upon the software we build (I'm talking collectively, here).
Take a deep breath
Of course, the above is not an exhaustive survey, just the best tidbits I found over the course of a lot of browsing and searching. Here's the upshot, as I see it:
WS-Security et al. ostensibly provide message-level security that ensures that your service can be passed along by untrusted intermediaries.
Standard HTTP authentication (generally Basic) over SSL transport is the de facto standard for securing REST services, but it does nothing for you if message security is important.
More sophisticated authentication mechanisms are available – in particular HMAC, as exemplified by Amazon's web services – which allow services to ensure that a message's author has not been impersonated. This would resolve the potential holes of .
Unfortunately, I didn't grok the whole message vs. transport security issue as quickly as I should have, where SSL provides the latter but the former would only be satisfied by something like WS-Security (again, ostensibly, I certainly can't vouch for it) or HMAC-SHA1 if one were working in a REST environment. If I had come to grips with that point of tension earlier, I would have arrived at my two conclusions much faster:
In our situation, message security is simply not relevant. As Peterson wrote (and I quoted above) "If the data is distributed and the security model is point to point (at best), [REST has] a problem." Well, in our case, data is not distributed, it is transmitted point-to-point (between our customers and us, a third-party external web service), so transport security provided by SSL should be sufficient.
Here's the biggie: assuming we support form-based authentication (of course, over SSL) for browser-based UI interaction, supporting anything more sophisticated than HTTP Basic authentication over SSL for our REST API interactions would be a waste of resources. We could go full-tilt and require HMAC-SHA1 for the REST API or provide only a SOAP API that used WS-Security (and whatever else goes into that), but that would mean nothing if an attacker has the "REST API" provided for browser use available to him. Given this, transport security provided by SSL, and that alone, is simply all we can do. Put another way: when browser-level security mechanisms improve, then so will our APIs'.
An alternative path would be to host a parallel service, available via a REST API secured via HMAC-SHA1 or a WS-Security-enabled SOAP API, that did not provide any kind of browser-capable entry point. Customers could opt into this if they thought the tradeoff was important. Doing this would be technically trivial (or, perhaps only moderately difficult w.r.t. the SOAP option ), but I've no idea whether the additional degree of security provided by such a parallel service would be of any interest to anyone.
By the way, if I'm totally blowing this, and my conclusions are completely broken, do speak up.
Coming soon: Part II of my investigation/thinking on the subject of web service security, related to OpenID and the management of credentials in general...which should give me all sorts of new opportunities to say foolish things!
Authored by Chas Emerick on Feb 19, 2010 02:07 PM
We build software to solve complicated problems and meet pressing needs. This is where we talk about building that software and the business that goes with it – making sure we have some fun along the way.
What we're proud of
DocuHarvest, a web application that extracts data from your valuable documents
PDFTextStream, just about the best PDF text extraction library (for Java and .NET) out there