Javascript Sorting

Not too long ago, I was asked, as a theoretical question, how I might take a string of random alphabetical characters and sort them in order using Javascript. Thus, given NFARAKDMSLSJDQKJKCXLK, find a simple way to turn it into AACDDFJJKKKKLLMNQRSSX. It’s a pretty basic question for anyone who uses Javascript and I immediately thought of splitting and joining the string as an array (a quick and easy way to tokenize the string). However, for some reason, I completely forgot about the Array.sort() function that’s pretty much a universal across languages.

Instead, the first thing that came to mind was the last time I had to sort a collection and started thinking of Java sorting. In the work I’ve done, such sorting usually requires the use of comparators because the default sort order is often not the one that’s desired. As a case in point, we have a set of JSP pages for rendering content including archive and list pages that display in posting date or sequence order derived from the content. For example, here’s some code to sort by a custom “sequence” attribute using the Vignette Content Management API (this particular snippet is the work of the inimitable Yong Chen):

Comparator aComparator = new Comparator(){
	public int compare(Object obj1, Object obj2) {
		int result=0;
		try {
			Integer s1 = (Integer) ((ContentInstance) obj1).getRelations(CT_RELATION_HEALTH_PLAN)[0].getAttribute("sequence").getValue();
			Integer s2 = (Integer) ((ContentInstance) obj2).getRelations(CT_RELATION_HEALTH_PLAN)[0].getAttribute("sequence").getValue();
			result=s1.compareTo(s2);
		} catch (com.vignette.as.client.exception.ApplicationException e) {
		}
		return result;
	}
};

For debugging, I might break it out a little differently to avoid the multiple casts in a single line. I think the Vignette API is fairly robust, but I would still split the ContentInstance cast apart from the Java native Integer cast in this case. But that might just be me.

In any event, while Javascript can also use comparators without much loss of speed, for a task like the one delineated initially, a simple sort will do the job and the whole can even be condensed to a single line function like this:

function order(unsortedString) {
	return unsortedString.split("").sort().join("");
}

Depending on the browser, the sort function will be some variation on a quick sort or a merge sort. It depends on whether the browser architects chose a stable algorithm or opted for some other version based on various considerations. Mozilla used to use a quicksort prior to Firefox 3.5, but they opted for a merge sort instead in more recent versions. In the course of reading about sort implementations, I came across notes on timsort, Python’s default sort implementation, that I found particularly intriguing.

As an aside the return value for a comparator that works in Firefox may fail with Webkit (Chrome and Safari) since you need to return -1 instead of zero (and a boolean false will also fail). You can read more about Webkit’s Array.sort() handling.

With Java, it looks like this:

String unsorted = "xzya";
char[] content = unsorted.toCharArray();
java.util.Arrays.sort(content);
String sorted = new String(content);

With Perl, you can do something like:

print (join “”, sort split //,$_)

In bash, this will do the same job:

echo “teststring” | grep -o . | sort -n |tr -d ‘\n’; echo

All in all, not only did I refresh my memory about good ways to sort strings and arrays in Javascript, I also learned some new things about sort algorithms and implementations of those algorithms.

Comments Off

Converting XML for Use in Teamsite

Suppose we have a collection of XML files that are content of some sort that we want to import to Teamsite. These items may be content from a database, content from some CMS or a collection of existing files (such as an XHTML website). There are several things we need to accomplish in order get the files in the proper format to use with Teamsite’s Form Publisher, the first being to convert the items to the proper format.

Teamsite provides the user with a set of perl utilities for manipulating content. In order to accomplish the various things I wanted to do without some tedious command line activity and lacking a readily available tool, I created a quick Perl script to accomplish several tasks. It takes a set of files in a directory tree, runs them through Saxon to convert them as a Data Capture Record (DCR) in a single directory and also create the file in the directory tree as it would be generated using the template, and then cleanup the various generated files. In the case of the DCRs, this includes removing the file extension since DCRs don’t use the extension even though they’re XML files.

This is the script to accomplish all the necessary conversions. In order to do the XML conversion, it uses Saxon via a System call and is definitely not optimized for production use (I took this route mainly because I had already downloaded Saxon). I created it to run on Windows with ActiveState’s ActivePerl, rather than using Cygwin or running it on the server running Teamsite:

#!perl
use File::Find;
use File::Copy;
use File::Path;
use HTML::Entities; 
 
# Configuration:
$origContent_path = "/MyConvertDirectory/MyExportedContent";
$origConverted_path = "/MyConvertDirectory/MyConvertedXMLContent";
$origConvertedDCR_path = "/MyConvertDirectory/MyConvertedDCRContent";
$saxon_path = "C:\\Saxon\\saxon9.jar";
$xsl_path = "/MyConvertDirectory/convertXML.xsl";
$xsl2dcr_path = "/MyConvertDirectory/convertDCR.xsl";
 
my $dir = $origContent_path;
print "Starting at ".$dir."\n\n";
print "Processing...";
mkpath( $origConvertedDCR_path ) if not -e $origConvertedDCR_path;
use Cwd;
find(\&convertItems, $dir);
find(\&cleanupFilesDirs, $origConverted_path);
find(\&stripDCR, $origConvertedDCR_path);
print "\n";
print "Done!\n";
 
#
sub convertItems {
	return unless -d;
	my $sourcedir = $File::Find::name;
	my $targetdir = $File::Find::name;
	$targetdir =~ s/Home/SiteContent/o;
	$targetdir =~ s/\s//g;
	$targetdir =~ s/\b([A-Z]{1})([A-Za-z]+)\b/\l$1$2/g;
	print "Transform ".$sourcedir." to ".$targetdir."\n";
	print ".";
	mkpath( $targetdir ) if not -e $targetdir;
	system "java -jar $saxon_path -o \"$targetdir\" -s \"$sourcedir\" -xsl \"$xsl_path\"";
	system "java -jar $saxon_path -o \"$origConvertedDCR_path\" -s \"$sourcedir\" -xsl \"$xsl2dcr_path\"";
}
# Decode entities in the content
# Then rename each file to Unix format, removing spaces, and making the name all lower case.
# Since I'm exceptionally lazy, I just unlink the item if it's size is 39 (meaning Saxon parsed the item, but the XSL transform returned just an XML header as the contents of the file)
# Except for the decoding and not stripping the extension, this method is the same as the method below
sub cleanupFilesDirs {
	-d and return; #skip item if it's a directory
	if(-s $_ < 39) {unlink $_ and return;}
	my $filename = $_;
	open(XMLCONTENT, "<$filename") or return;
	open(XMLOUT, ">temp.xml") or return;
	while() {
		chomp;
		my $content = $_;
		$content = decode_entities($content)."\n";
		print XMLOUT $content;
	}
	close XMLCONTENT;
	close XMLOUT;
	unlink $filename;
	$filename =~ s/([A-Za-z])\s?-\s/$1-/g;
	$filename =~ s/\s//g;
	$filename =~ s/^([A-Z])/\l$1/;
#	print  "Renaming content item temp.xml to ".$filename."\n";
	move ("temp.xml","$filename");
}
# Rename each file to Unix format, removing spaces, making the name all lower case and stripping off the extension.  Since DCR files reside in a single directory, the path is flattened.
# Since I'm exceptionally lazy, I just unlink the item if it's size is 39 (meaning Saxon parsed the item, but the XSL transform returned just an XML header as the contents of the file)
sub stripDCR {
	-d and return; #skip item if it's a directory
	if(-s $_ < 39) {unlink $_;}
	my $oldname = $_;
	my $newname = $oldname;
	chomp($newname);
	$newname =~ s/\s//g;
	$newname =~ s/^([A-Z])/\l$1/;
	$newname =~ s/_?\.xml$//;
	$newname =~ s/([A-Za-z])\s?-\s/$1 - /g;
#	print  "Renaming DCR item ".$oldname." to ".$newname."\n";
	if ($oldname eq $newname) { return;}
	rename($oldname, $newname);
}
exit;

The XSL files used are not provided in full because a) they’re relatively trivial and b) they’re customized on both ends (both based on the structure of the XML you’re converting and the XML being generated). In fact, the second XSL used to create the generated file from the DCR is essentially a representation of the Teamsite Template (.tpl) file and can be used to output XML, HTML, XHTML or whatever you need for your particular site.

Here’s the basic XSL to generate the DCR for Teamsite 6.4:

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs = "http://www.w3.org/2001/XMLSchema"
    xmlns:date="http://exslt.org/dates-and-times"
    exclude-result-prefixes="xs date">
<xsl:output omit-xml-declaration="no" method="xml" indent="no" doctype-system="dcr4.5.dtd" />
<xsl:template match="/">
  <!-- matched root -->
   <xsl:apply-templates select="docTopNode"/>
 </xsl:template>
 
 <xsl:template match="docTopNode">
  <xsl:variable name="docTitle" select="title" />
  <record name="{$docTitle}" type="content"><item name="newsitem"><value>
   <item name="title"><value><xsl:value-of select="title"/></value></item>
   <item name="mainContent"><value><xsl:value-of select="docContent/docBodyContent"/></value></item>
   <item name="sequence"><value>1</value></item>
  </value></item></record>
 </xsl:template>
 
 <xsl:template name="FormatDate">
  <xsl:param name="DateTime" />
  <xsl:value-of select="concat(substring($DateTime,6,2),'/',substring($DateTime,9,2),'/',substring($DateTime,1,4))"/>
 </xsl:template>
 <xsl:template name="FormatTime">
  <xsl:param name="thisTime" />
  <xsl:value-of select="substring($thisTime,12,8)"/>
 </xsl:template>
 
 <xsl:function name="date:parse-date" as="xs:string">
  <xsl:param name="parsable-date" />
  <xsl:choose>
   <xsl:when test="$parsable-date = '' ">2001-01-01</xsl:when>
   <xsl:otherwise>
    <xsl:value-of select="xs:string(xs:dateTime(xs:dateTime('1970-01-01T00:00:00') + $parsable-date *
 xs:dayTimeDuration('PT0.001S')))" />
   </xsl:otherwise>
  </xsl:choose>
 </xsl:function>
</xsl:stylesheet>

Which would parse an XML document like:

<?xml version="1.0"?>
<docTopNode>
	<title>Document Title</title>
	<docContent>
		<docBodyContent>
			&lt;p&gt;This might be some escapedHTML content with formatting&lt;/p&gt;
		</docBodyContent>
		<unusedContentNode>
			This should be ignored by the XSL.
		</unusedContentNode>
		<postingDate>1148281200000</postingDate>
	</docContent>
</docTopNode>

In this document, note the format of the date. This is the date in milliseconds from January 1st, 1970 that comes back from MS SQL Server. In order to convert for Teamsite’s use, I just needed an XPath statement to convert it (the reverse of the function to calculate milliseconds since 1970 in XPath 2.0).

The one remaining step after everything is converted and imported to Teamsite is to set the associations in the files.
List the file attributes:
/opt/teamsite/bin/iwextattr -l /pathtofile/file.xml
Set an attribute:
/opt/teamsite/bin/iwextattr -s attribute=value /pathtofile/file.xml
For example:
/opt/teamsite/bin/iwextattr -s TeamSite/Templating/DCR/Type=site/newsitem
//teamsitehome/WORKAREA/shared/templatedata/site/newsitem/data/040108NewItemDCR

and, on the generated item:
/opt/teamsite/bin/iwextattr -s TeamSite/Templating/PrimaryDCR=040108NewItemDCR
//teamsitehome/WORKAREA/pages/site/040108NewItem.xml

Comments Off