my recent reads..

On Parsing CSV and other Delimited/Quoted Formats

Parsing delimited text that may have quoted elements is a perennial requirement. Quick-and-dirty parses can be achieved with regular expressions, but for more flexible and encapsulated parsing I've been checking out the opencsv java library. Hat tip to Jakub Pawlowski for highlighting the library on his blog

A Regular Expression Approach
Just recently I released and blogged about a JDeveloper Filter Add-in, and it contains a class called ExecShell [API, source] which needs to know how to break a command line into its component arguments. The command line is of course space-delimited, but may use quotes to group an argument with embedded spaces (so a simple split on spaces won't do).

The salient code below uses the REGEX to chop theCmdLine String into theCmdArray Vector of arguments:

Vector<String> theCmdArray = new Vector<String>(0);
String REGEX = "\"([^\"]+?)\"\\s?|([^\\s]+)\\s?|\\s";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(theCmdLine);
while (m.find())
{
theCmdArray.add( m.group().trim() );
}

The regular expression bears a little explaining, and is inspired by this example. Here's how it breaks down:

\"([^\"]+?)\"\\s?
Matches a group within double-quotes. Group is a lazy match on one or more characters except double-quote. Optionally followed by some whitespace
|([^\\s]+)\\s?or Matches a group delimited by whitespace, optionally followed by some whitespace
|\\sDiscards a pure whitespace match

In this case, we are using whitespace as the delimiter (appropriate for command lines). The regex can be adapted for other delimiters by replacing \\s with the delimiter. For example, to handle a comma-separated format:
String REGEX = "\"([^\"]+?)\",?|([^,]+),?|,";

Using OpenCSV
The same space-delimited parsing requirement can be met with a couple of lines and the opencsv library:
CSVReader reader = new CSVReader(new StringReader(theCmdArray), ' ');
String[] s = reader.readNext();

Simple, yet currently not so robust. Since we define the delimiter to be a single space (over-ridding the default comma), other whitespace characters (like a tab) will not be recognised. Further, repeated spaces will not be coalesced, but will each be treated as the delimiter for a new element.

Internally, CSVReader parses the input character-by-character and so adapting to handle repeated delimiters as one would be reasonably straight-forward.
read more and comment..

The Right Way To Do Wrong - a good read for security buffs


The Right Way To Do Wrong - An Exposé of Successful Criminals is a very old book, published in 1906. I was intrigued since it was written by Harry Houdini, and I hadn't realised he was also an author.

Houdini's motive for writing the book is to warn off the righteous by educating them in all forms of devious frauds and scams, and to cause those less well intentioned to give pause before taking up a life of crime.

Reading the book over 100 years after publication, I am amazed - but perhaps on reflection not surprised - that Houdini manages to describe in great detail just about every Internet-related scam in existence (allowing of course for a transposition of technology)!

When he talks of Begging Letter Swindles, think Nigerian Letter or "419" Fraud. For Tricks of Bunco Men, see Advance Fee Scheme. The ease in which Impersonation/Identity Fraud was practiced in a pre-IT age... and just about every other gambit you can find on the FBI Common Fraud Schemes site, or in this great article on Worst Online Scams and Internet Frauds + tips for avoiding them.

If you are into IT Security, I think you'd enjoy reading this and mulling over the relevance to your day-to-day work. It is salutary to realise there is nothing new in the Evil that Men do, just new ways of doing it!

I listened to The Right Way To Do Wrong in audio from LibriVox. It is also available in print from Amazon.

read more and comment..

The Right Way To Do Wrong


The Right Way To Do Wrong - An Exposé of Successful Criminals is a very old book, published in 1906. I was intrigued since it was written by Harry Houdini, as I hadn't realised he was also an author.

Houdini's motive for writing the book is to warn off the righteous by educating them in all forms of devious frauds and scams, and to cause those less well intentioned to give pause before taking up a life of crime.

Reading the book over 100 years after publication, I am amazed but perhaps on reflection not surprised that Houdini manages to describe in great detail just about every Internet-related scam in existence (allowing of course for a transposition of technology).

When he talks of Begging Letter Swindles, think Nigerian Letter or "419" Fraud. For Tricks of Bunco Men, see Advance Fee Scheme. The ease in which Impersonation/Identity Fraud was practiced in a pre-IT age... and just about every other gambit you can find on the FBI Common Fraud Schemes site.

If you are into IT Security, I think you'd enjoy reading this and mulling over the relevance to your day-to-day work. It is salutary to realise there is nothing new in the Evil that Men do, just new ways of doing it.

I listened to The Right Way To Do Wrong in audio from LibriVox. It is also available in print from Amazon.

read more and comment..

The Boat


The Boat is Walter Gibson's extraordinary account of survival after being lost at sea when the Dutch steamer Rooseboom was sunk by torpedo on 2 Mar 1942. Gibson survived 26 days afloat in a lifeboat with 135 others. Only 5 made it to shore. Three Javanese sailors were seperated to uncertain fate. A fourth was Doris Lim, who after surviving the boat died tragically under Japanese interrogation before ever tasting freedom again.

When the story first broke in the news, the world was shocked by the tales of murder and cannibalism that Gibson witnessed aboard the lifeboat. It is perhaps made even more horrific by the stark, concise manner in which Gibson recounts The Boat (completed for the 10th anniversary of the Rooseboom's sinking).

Find it at the NLB

read more and comment..