Managing distributed XML document editing for nautical publications
Warning: This is a thought experiment meant to generate conversation. It’s in no way polished.
Recently, I posted how Matt and I think the XML markup might look for a geospatially enabled nautical publication. There are details missing including the geographic feature table.
GeoZui was originally built with Tcl/Tk, so it was easiest for Matt to his familiarity with Tcl to build a quick table. There is an even bigger issue looming to the general problem. The traditional view of updating nautical publications seems to be very much like the traditional linear model of a book. There is a key author at the Hydrographic Organization (HO) and maybe some editors. This has worked effectively for the last century. We could take the current PDF and paper product and convert it to a new XML format and use traditional tools such as text editors and revision control (e.g.
subversion [svn]) or a database and make it work.
However, I’d like to take a step back and imagine that we are starting from scratch with no pre-existing infrastructure. Throw away any constraints that particular countries might have and ask: What would be the best system that we could build now that would work for the next century? I know this is a bit ridiculous, but humor me for now.
A use caseNOTE: There should be many more use cases for this topic, but let's start with one.
What I think there needs to be is a traceable cycle of information. In the field or on the ship, anyone should be able to make a note and/or take a picture and submit an error, update, addition, or suggestion back to the HO. Let me give an example to illustrate how the cycle might work. A mariner enters a port and sees that there is a new pier on the south side of the harbor replacing an older pier with a different configuration. The person pulls out their mobile phone (iPhone, Android, CrackBerry, etc) and runs a submit Coast Pilot (CP) note application. The app uses the phone’s builtin camera to grab a picture, tags the picture with the GPS position from the builtin GPS, then queues the update. Later, when the mobile phone detects the person is no longer moving, the phone puts up a notice that there is a submission waiting. The user takes a moment to write that the pier is different and hits the send button. The phone sends the update to a public website for review. Local mariners comment on the website that the pier is indeed different and give the name of the company working on the pier. The HO is notified of the update, reviews the extra comments, and spends a little bit of money for an up-to-date satellite orthophoto. The HO digitizes the pier and sends notices to the charting branch and the nautical publications. In the nautical publications group, an author calls up a list of all the sections of text that refer to the pier and makes any changes necessary after talking to the pier authorities. The text changes are tagged with all the information about the edit including references to the initial submission in the field. The changes are passed to the editor for review. Once these changes are accepted a new coast pilot is generated, the submission note is marked as finished, and all the systems that use the coast pilot see the update and adjust their products.
This above is just an instance of what is called a “use case” in software engineering terminology. Try not get hung up on the details such as construction permitting in some countries already mandates submitting changes to the HO. The goal is to think about what a workflow might look like and what tools could support the process. How can we work with a marked up text document that makes it easy for the people editing and allows us to see where any of the text came from (aka provenance)? I don’t know what the answer is, but talking to people at SNPWG brought up a number of ideas. Here are some of the thoughts that I’ve had and I would really like to hear what others think.
What this post is notThis is in no way a complete run through what needs to be discussed!
Another important topic is managing image submissions. We would like to be able to avoid trouble with copyright or at least track who controls the rights. Georeferencing, feature identification and tagging will be important. I will try to talk more about this some other time.
These are in no particular order and I’m leaving out a number of commercial solutions (e.g.
Visual Studio Team Solution. And I’m mixing types of technology.
The Usual SuspectsThese are tools that have been used in the last 20 years to produce nautical publications. This is probably a fairly traditional style of creating publications.
Microsoft Word has been used to create all sorts of documents. It can do indexing, change tracking, comments, and all sorts of other fancy things. It tries to be everything to everybody, but trying to manage large teams using track changes is enough to make me go crazy. Plus, you never know what Microsoft will decide to do with Word in the future. I’ve seen people who are power users of Word and they do amazing things. But can a Word document handle 20 years of change tracking?
Adobe
FrameMaker (or just Frame) has been the solution for team based, large document generation. I’ve used Frame and seen groups such as Netscape’s directory server (slapd) Team produce incredibly intricate documents. MS Word is still trying to catch up to the power of Frame in 1997. And that’s before XML came into the core of Frame. However, it does seem like Adobe really doesn’t want to keep Frame alive. Frame supports DITA, which I will talk about later.
Several people have told me that they see Adobe replacing Frame with
InDesign. I really do not see how InDesign can cope with the workflow of a constantly updating document such as a Coast Pilot. Perhaps there is more to InDesign than I realize.
InDesign enables you to automate your publishing workflow
Automate workflows by minimizing repetitive tasks so you can focus on
what you love to do — design pages. Use text variables to dynamically
create running headers and footers and automatically generate advanced
bullets and numbering as well as tables of contents. InDesign also
enables a wide variety of XML-based publishing workflows. Support for
InDesign Markup Language (IDML), a new XML-based filed format, lets
developers and systems integrators programmatically create, modify,
and deconstruct InDesign documents outside the context of InDesign to
more quickly create custom publishing workflows. Comprehensive
scripting support for JavaScript, AppleScript, and VBScript enables
you to not only automate tedious tasks, but also build complex
automated workflows.
What other systems have been in use? If you are willing to share, please send me an email or post a comment.
Revision Control Systems With my background, the first thing that came to my mind is some sort of distributed revision control system / source control system (SCM). My experience has been mostly with RCS, CVS, and SVN, which are centralized systems. Therefore, I am a bit out of my league here. There are a number of open source implementations:
git,
GNU arch,
Mercurial (hg),
Bazaar,
darcs,
monotone, svn+
svk, and a host of others. Perhaps one of these really is the best solution if it were to be integrated with a nice editor and interfaces to easily allow tracing of the origin of the text. Can we add functionality to record that a particular change in the text goes to a particular change submission from the field?
Trac does something similar with svn repositories. If you commit a change to svn, you can reference a ticket in Trac. Adding Trac tickets in commits is not enforced or supported in any of the interfaces, so you have to know how to tag ticket references. If this is the right solution, how do we build a system that keeps the writers from having to become a pro at version control systems? The HO staff should be focused on the maritime issues and their skills should be focused on the writing.
Or could this be done on top of a non-distributed version control system? I am learning towards rejecting the idea of something like straight svn. You would like to be able to send nautical publications writers out in the field and be able to have them commit changes without having to have internet access. This is especially important in polar regions. You can not always count on internet access and Iridium’s bandwidth is just not large. The “svn blame” command illustrates a little bit of the idea of being able to see the source of text.
If I were to ask a number of people at NASA who I’ve worked with in the past, I’m sure they would tell me they could build the killer nautical publications collaboration system on top of Eclipse complete with resource management, fuel budgets, and AI to try to detect issues in the text.
SharepointMicrosoft
SharePoint appears at first look to be the sort of tool that the community needs to manage nautical publications. Documents can be edited on the server (windows only), checked out, tracked, and so forth. However, the versioning of SharePoint and the documents it contains are not really linked. Can SharePoint be made to understand what is different about files and can it build workflows? My experience with SharePoint has been very frustrating. If feels like a funky front end to a version control system that doesn’t have all the features of version control that people count on.
There are a whole slew of SharePoint clones or workalikes (e.g.
Document Repository System or
OpenDocMan). This is definitely a space to watch for innovation.
DocBookLooking at technologies such as LaTeX are traditionally combined to write papers and work well with being tracked in a revision control system. LaTeX does not have the mechanism to store extra information such as provenance or location, so it doesn’t make sense. What would Donald Knuth say about a geospatially aware LaTeX?
Perhaps another possibility it to work with the tools around DocBook? There are many tools built around DocBook. For example,
calenco is a collaborative editing web platform. I have no experience with kind of thing, but perhaps the DocBook tools could work with other XML types such as a nautical publication schema. Another tool that looks like it hasn’t seen any update in a long time is:
OWED – Online Wiki Editor for DocBookThere many other document markup languages (e.g.
S1000D). Are any of them useful to this kind of process? Wikipedia has a
Comparison of document markup languages that cover many more.
DatabasesWhat support do traditional relational databases have for this kind of task? I’ve looked at how wikis such as Trac or Mediawiki store their entries and they are fairly simple systems that store many copies of a document each with a revision tag:
Trac schema and
Mediawiki schemasPerhaps there are better technologies built directly into database systems? There is
SQL/XML [ISO International Standard ISO/IEC 9075-14:2003]
(
SQL/XML Tutorial), which might have useful functionality, but I have not had time to look into it.
Oracle has
Content Management,
XML, and
Text. I don’t use Oracle and don’t know much about what they have to offer.
PostgreSQL has a lot of powerful features including
XML Support with the
XML TypeContent Management Systems (CMS)This topic sounds exactly like what we need, but having used a number of these systems, I think they miss out on many of the tracking features that are needed and they are mostly focused on the HTML output side of things. There are a huge number of systems to choose from as shown by Wikipedia’s
List of content management systems. The run the range from wikis (e.g.
TWiki) to easy to deploy web systems (e.g.
Joomla!), all the way to frameworks that have you build up your own system (e.g.
Django or
Ruby on Rails). I don’t think the prebuilt systems are really up for this kind of task, but combining a powerful framework with the right database technology or version control system might well be a good answer.
Darwin Information Typing Architecture (DITA)DITA sounds very interesting, but might not be capable of helping documents like nautical publications that have less structure.
Closing thoughtsAs you can see from the above text, my ideas on this topic are far from fully formed. I want to leave you with a couple wikipedia topics:
I looking for opinions on all sides of the technology and editing spectrum. I am sure many publishers have faced and survived this problem over the years. I definitely ran out of steam on this post and it's probably my largest post where I'm not quoting some large document or code. Hopefully the above made some sense.