Converting XML for Use in Teamsite

Suppose we have a collection of XML files that are content of some sort that we want to import to Teamsite. These items may be content from a database, content from some CMS or a collection of existing files (such as an XHTML website). There are several things we need to accomplish in order get the files in the proper format to use with Teamsite’s Form Publisher, the first being to convert the items to the proper format.

Teamsite provides the user with a set of perl utilities for manipulating content. In order to accomplish the various things I wanted to do without some tedious command line activity and lacking a readily available tool, I created a quick Perl script to accomplish several tasks. It takes a set of files in a directory tree, runs them through Saxon to convert them as a Data Capture Record (DCR) in a single directory and also create the file in the directory tree as it would be generated using the template, and then cleanup the various generated files. In the case of the DCRs, this includes removing the file extension since DCRs don’t use the extension even though they’re XML files.

This is the script to accomplish all the necessary conversions. In order to do the XML conversion, it uses Saxon via a System call and is definitely not optimized for production use (I took this route mainly because I had already downloaded Saxon). I created it to run on Windows with ActiveState’s ActivePerl, rather than using Cygwin or running it on the server running Teamsite:

#!perl
use File::Find;
use File::Copy;
use File::Path;
use HTML::Entities; 
 
# Configuration:
$origContent_path = "/MyConvertDirectory/MyExportedContent";
$origConverted_path = "/MyConvertDirectory/MyConvertedXMLContent";
$origConvertedDCR_path = "/MyConvertDirectory/MyConvertedDCRContent";
$saxon_path = "C:\\Saxon\\saxon9.jar";
$xsl_path = "/MyConvertDirectory/convertXML.xsl";
$xsl2dcr_path = "/MyConvertDirectory/convertDCR.xsl";
 
my $dir = $origContent_path;
print "Starting at ".$dir."\n\n";
print "Processing...";
mkpath( $origConvertedDCR_path ) if not -e $origConvertedDCR_path;
use Cwd;
find(\&convertItems, $dir);
find(\&cleanupFilesDirs, $origConverted_path);
find(\&stripDCR, $origConvertedDCR_path);
print "\n";
print "Done!\n";
 
#
sub convertItems {
	return unless -d;
	my $sourcedir = $File::Find::name;
	my $targetdir = $File::Find::name;
	$targetdir =~ s/Home/SiteContent/o;
	$targetdir =~ s/\s//g;
	$targetdir =~ s/\b([A-Z]{1})([A-Za-z]+)\b/\l$1$2/g;
	print "Transform ".$sourcedir." to ".$targetdir."\n";
	print ".";
	mkpath( $targetdir ) if not -e $targetdir;
	system "java -jar $saxon_path -o \"$targetdir\" -s \"$sourcedir\" -xsl \"$xsl_path\"";
	system "java -jar $saxon_path -o \"$origConvertedDCR_path\" -s \"$sourcedir\" -xsl \"$xsl2dcr_path\"";
}
# Decode entities in the content
# Then rename each file to Unix format, removing spaces, and making the name all lower case.
# Since I'm exceptionally lazy, I just unlink the item if it's size is 39 (meaning Saxon parsed the item, but the XSL transform returned just an XML header as the contents of the file)
# Except for the decoding and not stripping the extension, this method is the same as the method below
sub cleanupFilesDirs {
	-d and return; #skip item if it's a directory
	if(-s $_ < 39) {unlink $_ and return;}
	my $filename = $_;
	open(XMLCONTENT, "<$filename") or return;
	open(XMLOUT, ">temp.xml") or return;
	while() {
		chomp;
		my $content = $_;
		$content = decode_entities($content)."\n";
		print XMLOUT $content;
	}
	close XMLCONTENT;
	close XMLOUT;
	unlink $filename;
	$filename =~ s/([A-Za-z])\s?-\s/$1-/g;
	$filename =~ s/\s//g;
	$filename =~ s/^([A-Z])/\l$1/;
#	print  "Renaming content item temp.xml to ".$filename."\n";
	move ("temp.xml","$filename");
}
# Rename each file to Unix format, removing spaces, making the name all lower case and stripping off the extension.  Since DCR files reside in a single directory, the path is flattened.
# Since I'm exceptionally lazy, I just unlink the item if it's size is 39 (meaning Saxon parsed the item, but the XSL transform returned just an XML header as the contents of the file)
sub stripDCR {
	-d and return; #skip item if it's a directory
	if(-s $_ < 39) {unlink $_;}
	my $oldname = $_;
	my $newname = $oldname;
	chomp($newname);
	$newname =~ s/\s//g;
	$newname =~ s/^([A-Z])/\l$1/;
	$newname =~ s/_?\.xml$//;
	$newname =~ s/([A-Za-z])\s?-\s/$1 - /g;
#	print  "Renaming DCR item ".$oldname." to ".$newname."\n";
	if ($oldname eq $newname) { return;}
	rename($oldname, $newname);
}
exit;

The XSL files used are not provided in full because a) they’re relatively trivial and b) they’re customized on both ends (both based on the structure of the XML you’re converting and the XML being generated). In fact, the second XSL used to create the generated file from the DCR is essentially a representation of the Teamsite Template (.tpl) file and can be used to output XML, HTML, XHTML or whatever you need for your particular site.

Here’s the basic XSL to generate the DCR for Teamsite 6.4:

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs = "http://www.w3.org/2001/XMLSchema"
    xmlns:date="http://exslt.org/dates-and-times"
    exclude-result-prefixes="xs date">
<xsl:output omit-xml-declaration="no" method="xml" indent="no" doctype-system="dcr4.5.dtd" />
<xsl:template match="/">
  <!-- matched root -->
   <xsl:apply-templates select="docTopNode"/>
 </xsl:template>
 
 <xsl:template match="docTopNode">
  <xsl:variable name="docTitle" select="title" />
  <record name="{$docTitle}" type="content"><item name="newsitem"><value>
   <item name="title"><value><xsl:value-of select="title"/></value></item>
   <item name="mainContent"><value><xsl:value-of select="docContent/docBodyContent"/></value></item>
   <item name="sequence"><value>1</value></item>
  </value></item></record>
 </xsl:template>
 
 <xsl:template name="FormatDate">
  <xsl:param name="DateTime" />
  <xsl:value-of select="concat(substring($DateTime,6,2),'/',substring($DateTime,9,2),'/',substring($DateTime,1,4))"/>
 </xsl:template>
 <xsl:template name="FormatTime">
  <xsl:param name="thisTime" />
  <xsl:value-of select="substring($thisTime,12,8)"/>
 </xsl:template>
 
 <xsl:function name="date:parse-date" as="xs:string">
  <xsl:param name="parsable-date" />
  <xsl:choose>
   <xsl:when test="$parsable-date = '' ">2001-01-01</xsl:when>
   <xsl:otherwise>
    <xsl:value-of select="xs:string(xs:dateTime(xs:dateTime('1970-01-01T00:00:00') + $parsable-date *
 xs:dayTimeDuration('PT0.001S')))" />
   </xsl:otherwise>
  </xsl:choose>
 </xsl:function>
</xsl:stylesheet>

Which would parse an XML document like:

<?xml version="1.0"?>
<docTopNode>
	<title>Document Title</title>
	<docContent>
		<docBodyContent>
			&lt;p&gt;This might be some escapedHTML content with formatting&lt;/p&gt;
		</docBodyContent>
		<unusedContentNode>
			This should be ignored by the XSL.
		</unusedContentNode>
		<postingDate>1148281200000</postingDate>
	</docContent>
</docTopNode>

In this document, note the format of the date. This is the date in milliseconds from January 1st, 1970 that comes back from MS SQL Server. In order to convert for Teamsite’s use, I just needed an XPath statement to convert it (the reverse of the function to calculate milliseconds since 1970 in XPath 2.0).

The one remaining step after everything is converted and imported to Teamsite is to set the associations in the files.
List the file attributes:
/opt/teamsite/bin/iwextattr -l /pathtofile/file.xml
Set an attribute:
/opt/teamsite/bin/iwextattr -s attribute=value /pathtofile/file.xml
For example:
/opt/teamsite/bin/iwextattr -s TeamSite/Templating/DCR/Type=site/newsitem
//teamsitehome/WORKAREA/shared/templatedata/site/newsitem/data/040108NewItemDCR

and, on the generated item:
/opt/teamsite/bin/iwextattr -s TeamSite/Templating/PrimaryDCR=040108NewItemDCR
//teamsitehome/WORKAREA/pages/site/040108NewItem.xml

Comments Off