Converting XML for Use in Teamsite

Suppose we have a collection of XML files that are content of some sort that we want to import to Teamsite. These items may be content from a database, content from some CMS or a collection of existing files (such as an XHTML website). There are several things we need to accomplish in order get the files in the proper format to use with Teamsite’s Form Publisher, the first being to convert the items to the proper format.

Teamsite provides the user with a set of perl utilities for manipulating content. In order to accomplish the various things I wanted to do without some tedious command line activity and lacking a readily available tool, I created a quick Perl script to accomplish several tasks. It takes a set of files in a directory tree, runs them through Saxon to convert them as a Data Capture Record (DCR) in a single directory and also create the file in the directory tree as it would be generated using the template, and then cleanup the various generated files. In the case of the DCRs, this includes removing the file extension since DCRs don’t use the extension even though they’re XML files.

This is the script to accomplish all the necessary conversions. In order to do the XML conversion, it uses Saxon via a System call and is definitely not optimized for production use (I took this route mainly because I had already downloaded Saxon). I created it to run on Windows with ActiveState’s ActivePerl, rather than using Cygwin or running it on the server running Teamsite:

#!perl
use File::Find;
use File::Copy;
use File::Path;
use HTML::Entities; 
 
# Configuration:
$origContent_path = "/MyConvertDirectory/MyExportedContent";
$origConverted_path = "/MyConvertDirectory/MyConvertedXMLContent";
$origConvertedDCR_path = "/MyConvertDirectory/MyConvertedDCRContent";
$saxon_path = "C:\\Saxon\\saxon9.jar";
$xsl_path = "/MyConvertDirectory/convertXML.xsl";
$xsl2dcr_path = "/MyConvertDirectory/convertDCR.xsl";
 
my $dir = $origContent_path;
print "Starting at ".$dir."\n\n";
print "Processing...";
mkpath( $origConvertedDCR_path ) if not -e $origConvertedDCR_path;
use Cwd;
find(\&convertItems, $dir);
find(\&cleanupFilesDirs, $origConverted_path);
find(\&stripDCR, $origConvertedDCR_path);
print "\n";
print "Done!\n";
 
#
sub convertItems {
	return unless -d;
	my $sourcedir = $File::Find::name;
	my $targetdir = $File::Find::name;
	$targetdir =~ s/Home/SiteContent/o;
	$targetdir =~ s/\s//g;
	$targetdir =~ s/\b([A-Z]{1})([A-Za-z]+)\b/\l$1$2/g;
	print "Transform ".$sourcedir." to ".$targetdir."\n";
	print ".";
	mkpath( $targetdir ) if not -e $targetdir;
	system "java -jar $saxon_path -o \"$targetdir\" -s \"$sourcedir\" -xsl \"$xsl_path\"";
	system "java -jar $saxon_path -o \"$origConvertedDCR_path\" -s \"$sourcedir\" -xsl \"$xsl2dcr_path\"";
}
# Decode entities in the content
# Then rename each file to Unix format, removing spaces, and making the name all lower case.
# Since I'm exceptionally lazy, I just unlink the item if it's size is 39 (meaning Saxon parsed the item, but the XSL transform returned just an XML header as the contents of the file)
# Except for the decoding and not stripping the extension, this method is the same as the method below
sub cleanupFilesDirs {
	-d and return; #skip item if it's a directory
	if(-s $_ < 39) {unlink $_ and return;}
	my $filename = $_;
	open(XMLCONTENT, "<$filename") or return;
	open(XMLOUT, ">temp.xml") or return;
	while() {
		chomp;
		my $content = $_;
		$content = decode_entities($content)."\n";
		print XMLOUT $content;
	}
	close XMLCONTENT;
	close XMLOUT;
	unlink $filename;
	$filename =~ s/([A-Za-z])\s?-\s/$1-/g;
	$filename =~ s/\s//g;
	$filename =~ s/^([A-Z])/\l$1/;
#	print  "Renaming content item temp.xml to ".$filename."\n";
	move ("temp.xml","$filename");
}
# Rename each file to Unix format, removing spaces, making the name all lower case and stripping off the extension.  Since DCR files reside in a single directory, the path is flattened.
# Since I'm exceptionally lazy, I just unlink the item if it's size is 39 (meaning Saxon parsed the item, but the XSL transform returned just an XML header as the contents of the file)
sub stripDCR {
	-d and return; #skip item if it's a directory
	if(-s $_ < 39) {unlink $_;}
	my $oldname = $_;
	my $newname = $oldname;
	chomp($newname);
	$newname =~ s/\s//g;
	$newname =~ s/^([A-Z])/\l$1/;
	$newname =~ s/_?\.xml$//;
	$newname =~ s/([A-Za-z])\s?-\s/$1 - /g;
#	print  "Renaming DCR item ".$oldname." to ".$newname."\n";
	if ($oldname eq $newname) { return;}
	rename($oldname, $newname);
}
exit;

The XSL files used are not provided in full because a) they’re relatively trivial and b) they’re customized on both ends (both based on the structure of the XML you’re converting and the XML being generated). In fact, the second XSL used to create the generated file from the DCR is essentially a representation of the Teamsite Template (.tpl) file and can be used to output XML, HTML, XHTML or whatever you need for your particular site.

Here’s the basic XSL to generate the DCR for Teamsite 6.4:

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs = "http://www.w3.org/2001/XMLSchema"
    xmlns:date="http://exslt.org/dates-and-times"
    exclude-result-prefixes="xs date">
<xsl:output omit-xml-declaration="no" method="xml" indent="no" doctype-system="dcr4.5.dtd" />
<xsl:template match="/">
  <!-- matched root -->
   <xsl:apply-templates select="docTopNode"/>
 </xsl:template>
 
 <xsl:template match="docTopNode">
  <xsl:variable name="docTitle" select="title" />
  <record name="{$docTitle}" type="content"><item name="newsitem"><value>
   <item name="title"><value><xsl:value-of select="title"/></value></item>
   <item name="mainContent"><value><xsl:value-of select="docContent/docBodyContent"/></value></item>
   <item name="sequence"><value>1</value></item>
  </value></item></record>
 </xsl:template>
 
 <xsl:template name="FormatDate">
  <xsl:param name="DateTime" />
  <xsl:value-of select="concat(substring($DateTime,6,2),'/',substring($DateTime,9,2),'/',substring($DateTime,1,4))"/>
 </xsl:template>
 <xsl:template name="FormatTime">
  <xsl:param name="thisTime" />
  <xsl:value-of select="substring($thisTime,12,8)"/>
 </xsl:template>
 
 <xsl:function name="date:parse-date" as="xs:string">
  <xsl:param name="parsable-date" />
  <xsl:choose>
   <xsl:when test="$parsable-date = '' ">2001-01-01</xsl:when>
   <xsl:otherwise>
    <xsl:value-of select="xs:string(xs:dateTime(xs:dateTime('1970-01-01T00:00:00') + $parsable-date *
 xs:dayTimeDuration('PT0.001S')))" />
   </xsl:otherwise>
  </xsl:choose>
 </xsl:function>
</xsl:stylesheet>

Which would parse an XML document like:

<?xml version="1.0"?>
<docTopNode>
	<title>Document Title</title>
	<docContent>
		<docBodyContent>
			&lt;p&gt;This might be some escapedHTML content with formatting&lt;/p&gt;
		</docBodyContent>
		<unusedContentNode>
			This should be ignored by the XSL.
		</unusedContentNode>
		<postingDate>1148281200000</postingDate>
	</docContent>
</docTopNode>

In this document, note the format of the date. This is the date in milliseconds from January 1st, 1970 that comes back from MS SQL Server. In order to convert for Teamsite’s use, I just needed an XPath statement to convert it (the reverse of the function to calculate milliseconds since 1970 in XPath 2.0).

The one remaining step after everything is converted and imported to Teamsite is to set the associations in the files.
List the file attributes:
/opt/teamsite/bin/iwextattr -l /pathtofile/file.xml
Set an attribute:
/opt/teamsite/bin/iwextattr -s attribute=value /pathtofile/file.xml
For example:
/opt/teamsite/bin/iwextattr -s TeamSite/Templating/DCR/Type=site/newsitem
//teamsitehome/WORKAREA/shared/templatedata/site/newsitem/data/040108NewItemDCR

and, on the generated item:
/opt/teamsite/bin/iwextattr -s TeamSite/Templating/PrimaryDCR=040108NewItemDCR
//teamsitehome/WORKAREA/pages/site/040108NewItem.xml

Comments Off

Why Blog?

I enjoy reading Steve Yegge’s blog. In his most recent post, he said:
“Even if nobody were to read my blog, the act of writing things down helps me think more clearly, and it’s engaging in the same way that solving a Sudoku puzzle is engaging.”
“You should try it yourself. All it takes is a little practice.”

Truer words were never said. At least for me, this is very true. I was thinking about the difference between the items I commit to this blog and, since I’ve started, I found it not only helps me organize my thoughts (much like a good project retrospective or post-mortem should), but it also gives me a sense of purpose and energy. I keep notes on projects, I have libraries of code snippets and save all manner of links and other electronic ephemera for my personal reference. But by discriminating among them, winnowing out the chaff and selecting particular items to polish and post, I gain new perspective on the work I do. And, like Mr. Yegge, even if no one ever reads this blog, I find it worthwhile. But you should read Steve’s blog – it’s quite excellent.

Comments Off

Scrumfall and Other Methodologies

I currently work as part of team using Agile practices. I’ve experienced some of the practices before and definitely see the benefits of methodologies like XP and Scrum. One thing we’ve struggled with on our latest project has been lapsing into a practice best described as “scrumfall.” In fact, at this point, even though we have daily standups and move tasks on the board, we’re really practicing waterfall development right at the moment. So when I came across the Waterfall Manifesto, I just had to share it with the team.

I liked the Principles with things like:

  • Business people and developers must work together as fewer times as possible throughout the project. They both have other important activities to carry to waste their precious time in meetings. Furthermore, they do not speak the same language.
  • The safer method of conveying information to and within a development team is by a formal document approved by the steering committee.
  • At regular intervals, the team should meet to eat pizzas and drink beer. It helps developers to forget that they are working in a bloody software development project and confirms that the management really cares about people. (they do not mention pizzas and beers in the other manifesto, think about it…)

Maybe not as humorous as The Register’s Bastard Operator (and, more), but this was more apropos of our current state.

Comments Off