Monday, October 02, 2006

I have a lot of respect for MSDN technical articles. They are more often than not a source of knowledge one would be hard pressed to obtain from official documentation. Kraig Brockschmidt's treatise on OLE internals, Nancy Winnick Clutz explanation of TAPI, Icon internals by John Hornick - all of these are precious gems of knowledge. Heck, I even wrote a few myself. Over the years there were some that are brilliant and concise, other that were less interesting. Almost every article there had some code posted with it. The code samples would also vary in quality, but not so much as to raise an eyebrow.

On several occasions I had this conversation with my teenage daughter, the gist of which was that even though the Algebra lesson is not an English lesson, it does not mean she can disregard the grammar completely, while doing her math homework. Apparently it was not obvious to her, that writing properly is not something you do only when you absolutely have to. Similarly, I suppose when you write code, even throwaway code, you should still be conscious of how you do it.

When I first came across this, my first reaction was - this is good stuff. It'll teach developers not to use an old, outdated control and show, how to replace it with alternative modern controls. And then I saw the following gem:

If you must search compiled code, you can look certain patterns that represent the GUIDs. For example, the following GUID:

{ABCDEFGH-IJKL-MNOP-QRST-UVWXYZ012345}

becomes the following hexadecimal sequence in binary:

GH EF CD AB KL IJ OP MN QR ST UV WX YZ 01 23 45

 

Er, what hexadecimal sequence?
But following it was a code sample

// Compile and execute:  "FindGUIDs YourApplication.exe"
using System;
using System.Text;
using System.IO;

namespace FindGUIDs {
class Program {
  static void Main(string[] args) {
    FileStream    fs = File.OpenRead(args[0]);
    StringBuilder sb = new StringBuilder();
    do {
      Int32 b = fs.ReadByte();
      if (-1 == b) {
        break;
      }
      sb.AppendFormat("{0:X2}", b);
    } while (true);
    fs.Close();
    String s = sb.ToString();
    if (s.Contains("E0A58D4371F1D011984E0000F80270F8"))
      Console.Out.WriteLine("GUID for TriEditDocument Class detected.");
    if (s.Contains("DFA58D4371F1D011984E0000F80270F8")) {
      Console.Out.WriteLine(
        "GUID for ITriEditDocument Interface detected.");
      }
      if (s.Contains("0002362DF5FFd1118D0300A0C959BC0A")) {
        Console.Out.WriteLine("GUID for DHTMLEdit Class detected.");
      }
      if (s.Contains("91B504CE1F2Bd2118D1E00A0C959BC0A")) {
        Console.Out.WriteLine("GUID for IDHTMLEdit Interface detected.");
      }
    }
  }
}

Basically, what's happening here is that the application is trying to find occurences of a GUID in a binary file. I've seen many approaches to searching a binary string in a file. Some were simpler to implement, the other are more efficient, but harder to understand. This one takes the cake. In a nutshell, this code reads a binary file, byte by byte, and converts each byte into its string hexadecimal represenation. Then this string is appended to a StringBuilder. Once the entire file is loaded into StringBuilder (consuming filesize * 4 bytes of memory), the StringBuilder is used to produce a string (another memory allocation of the same size edited: no, this is actually done in place. Thanks, Dunkan!) and the string is being searched for a GUID substring.

I won't even go into the efficiency of string search as used above. My point is that you either do things right, or you sidestep the whole issue by not providing the code sample (not really needed in the context of this article) and leaving it as an excercise for the reader.

Some screens later in the article we find the following gem of regular expression (JScript, searching an HTML document for tag):

var rex = new RegExp("]*>", "i");

Er, what happened to the non-greedy qualifiers? What is that \s doing inside []? This feels like something written by a person, who does not use javascript day-to-day.

Please, please let's keep MSDN technical library standards high. After all we all benefit from better written code.

10/2/2006 1:23:53 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [3]  | 
10/4/2006 7:40:48 AM (Pacific Daylight Time, UTC-07:00)
Agreed, that article sucks. However, while we are on the subject of correctness the following statement:

"StringBuilder is used to produce a string (another memory allocation of the same size)"

Is IMO incorrect. My understanding is that an instance of StringBuilder will basically take care of resizing an underlying array as required when Append or AppendFormat are called. It then hands off an instance of the String class pointing to that very same array when ToString() is called.

The memory allocation overhead actually comes when the StringBuilder has to 'grow' by creating a new, larger, array and copying the contents of the old one into it.
duncan
10/4/2006 10:39:32 AM (Pacific Daylight Time, UTC-07:00)
> "StringBuilder is used to produce a string (another memory allocation of the same size)"

> Is IMO incorrect. My understanding is that an instance of StringBuilder will basically take care of resizing an
> underlying array as required when Append or AppendFormat are called. It then hands off an instance of the String
> ass pointing to that very same array when ToString() is called.

I thought about it. My concern is that since strings are immutable, how can SB use it's internal storage (subject to subsequent changes, disposal etc) as the string data? Wouldn't it have to create a clone? OTOH what you say makes sense. Let me see if I could track this down
Alex Feinman
10/5/2006 4:16:02 AM (Pacific Daylight Time, UTC-07:00)
it got lodged in brain from some thime back:
http://www.dotnetrocks.com/default.aspx?showID=88
(check the date!)
transcript at:
http://www.code-magazine.com/Article.aspx?quickid=0501091
duncan
Name
E-mail
Home page

Comment (HTML not allowed)  

Enter the code shown (prevents robots):