Access Keys:
Skip to content (Access Key - 0)
Welcome to Muck and Brass, the Snowtide blog site    

News from June, 2005

blog entry  2005/06/16

Every person has a particular set of experiences they search for when choosing an occupation. For me, I’ve always be fascinated with the act and process of discovery. Thankfully, helping to build and maintain PDFTextStream satisfies that fascination in spades in ways that I never anticipated.

One would assume that working on a piece of software that extracts text from PDF documents would be pretty dry work. And, to a certain extent, it is: supporting all of the intricacies and minutiae associated with a complex file format like PDF is not the most thrilling software development work.

However, what can be exciting about the experience is how it forces me to be exposed to things that I never would have seen otherwise. See, in order to ensure that PDFTextStream works well and continues to do so as it is improved and changed, we have developed a suite of test PDF documents. These documents must be examined one by one, fed into PDFTextStream, and records of the documents’ logical structure and text content saved off into what are called ‘ground truth’ files. Then, whenever a change is made to PDFTextStream, our automated tests compare all of the preexisting ground truth files with what PDFTextStream provides after it has been changed. This process of constantly tracking the impact of changes to PDFTextStream is critical in ensuring that it continues to be robust, providing high-quality output.

The point here though, is that the process of building up and maintaining our suite of PDF documents (which numbers in the thousands now) exposes us to documents from nearly every corner of human activity. That’s thrilling for me, as I get the option to read about things that I never would have come across had I not been involved in PDFTextStream. For example, our test suite includes PDF documents like:

  • An issue of the newsletter produced by the National Multiple Sclerosis Society
  • A research paper describing CFS, a Cryptographic File System for Unix that was developed at AT&T
  • Various PDF versions of U.S. patents
  • A maintenance worksheet that describes how to apply and care for a particular type of asphalt emulsion
  • A whitepaper discussing various systems that help in managing spectral data
  • An essay by Seth Godin called Do Less that discusses the need to be selective in one’s entrepreneurial venture
  • An English translation of an al Qaeda training manual siezed by the Manchester, UK police in a raid of an al Qaeda cell house
  • An article discussing options for 2D visualization of complex ontologies
  • The 2004 roster for the University of Pittsburgh softball team
  • A PDF version of a Powerpoint presentation about the excruciating financial minutiae of reinsurance
  • An article about how to safely set up and use tower scaffolding
  • A catalog of activities at the 2003 Melbourne Scarf Festival (who knew someone would ever host a lecture called “The Nature of Scarves”?)

As you can see, the list goes on and on and on. The world of human knowledge and experience is functionally infinite, but I love getting glimpses of obscure corners of it and making little personal discoveries. Pretty geeky, I know, but that’s not really surprising, is it?

Posted at 16 Jun @ 6:50 PM by user Chas Emerick | comment 0 comments
blog entry  2005/06/19

Marketing is really hard, despite the rumors you’ve heard. The more I get into it, the more I’ve come to respect the skills (if not necessarily the tactics) necessary to deliver a message to prospective customers.

Up until this point, Snowtide has done virtually no marketing, and we’ve made out very nicely. We now have a mature product that really kicks ass. I’m proud of what PDFTextStream is doing for its users, some of whom simply would not be able to do their jobs if it weren’t for it.

But we’re past the point of working small niches. Scores of development shops, large and small, would have fewer bad days if they had PDFTextStream humming on their servers and in their products. So, the time has come to spread the gospel and make sure they know that.

To that end, we’re starting a new marketing strategy in July. It’s going to start slow as we learn our footing (the conventional wisdom is that summertime sees a slowdown in corporate software purchases because of vacationing). It will build through the end of the year. And, it will end with PDFTextStream being the only serious choice for developers in enterprise-class environments.

There’s the tricky part, though: convincing people that our product is better than its competition. The foundation for that has been laid for PDFTextStream — it’s been borne out in customer experiences. The problem is that, without appropriate marketing, the people that are likely to appreciate that fact will never even know about your product. In order to change that, we’ve got to write good ad copy, hire good designers to craft and mold that copy into digestable elements (ad banners, text ads, white papers, editorial placements, etc), and feed those elements into a cacaphony of interruptive marketing noise to be noticed and not ignored.

Technical people and marketing folks have always had their differences; they simply do not understand the difficulties inherent in their respective trades, and that often leads to disrespect. That is ever so slowly changing, in part because of pieces similar to this post, typically made by an in-the-trenches software company founder (like myself, I suppose), who inevitably describes how difficult marketing is. And seriously — it’s really, really, hard.

Every step in the progression of tasks I enumerated that leads to a prospective customer seeing, noticing, and acting on a pice of advertising is hard. And personally, I find it very unpleasant, simply because I am, by nature, technical. I know how the bits in software work, and I know those types of things very well. It’s a perfect occupation for someone who is a bit of a control nut. Yes, I am that.

So it makes me very uneasy to engage in an activity (like marketing) where I cannot readily control the outcome. It makes me even more uneasy to engage in an activity (like marketing) where I am less than fully confident in my (and in this case, our) abilities. We are fundamentally technical; we know how the bits work. Even with help, we find the fuzzy, soft, vague world of marketing just a little scary.

That will get better in time, as we fail a little, succeed a little, and do a little more of the latter and a little less of the former each time we try. It would be a high crime to not try, try hard, and try often; we have a great product, it should be seen, and it will be seen.

Posted at 19 Jun @ 10:21 PM by user Chas Emerick | comment 2 comments
blog entry  2005/06/28

The past 10 days have been just nuts.

When it rains, it’s buckets.

We got hit last week with serious inquiries from a half a dozen very large organizations — a good mix of governmental, corporate, and nonprofit/research. Each of them already had a grasp of what PDFTextStream could mean to them and their projects, especially on the performance and text extraction quality fronts. However, each of them also were looking for some broader extraction functionality: bookmarks, annotations, tagged PDF structures, etc.

This is stuff we were already working on and planning to add into the mix, but these new requests certainly kicked the pace up quite a bit. Some of it was pretty quick and easy to finish up and move into beta phase — that will find its way into released versions very soon.

Other stuff is a little harder though, to put it mildly: OCR of text in images in PDF’s, decryption of digitally-signed documents, and other higher-order functionality. Again, all stuff we’ve been positioning ourselves to jump on, but when there’s fish to fry, we all start cooking a little faster. (Now’s when you’re supposed to groan at the horrible pun….)

So, we're definitely busy. Now, who said software slowed down in the summer?

Posted at 28 Jun @ 12:04 AM by user Chas Emerick | comment 0 comments
Founder, Snowtide Informatics

About Me

I'm the founder of Snowtide Informatics. We make DocuHarvest, a web application that turns your valuable documents into data, and PDFTextStream, a PDF text extraction library for Java and .NET. I do a lot of programming in Clojure and just a little in Java, trying to make it easier for people to make unstructured content just a little more useful.

    Topics

    Archives

    1. 2010
      1. July
      2. June
      3. May
      4. April
      5. March
      6. February
      7. January
    2. 2009
      1. December
      2. November
      3. October
      4. September
      5. April
      6. March
      7. February
      8. January
    3. 2008
      1. November
      2. July
      3. May
      4. March
    4. 2007
      1. November
      2. October
      3. April
      4. March
      5. February
    5. 2006
      1. December
      2. October
      3. September
      4. August
      5. January
    6. 2005
      1. September
      2. August
      3. July
      4. June
      5. January
    7. 2004
      1. December
      2. September
    Adaptavist Theme Builder (3.3.5-conf210) Powered by Atlassian Confluence 3.0.2, the Enterprise Wiki.