One Word: Cute
27. January, 2009Another Lesson on Performance
23. January, 2009Just another story you can tell someone who fears that “XYZ might be too slow”:
I’m toying with the idea to write a new text editor. I mean, I’ve written my own OS, my own XML parser and I once maintained XDME, an editor written originally by Matthew Dillon. XDME has a couple of bugs and major design flaws that I always wanted to fix but never really got to it. Anyway.
So what is the best data structure for a text editor in 2008? List of lines? Gap-Buffer? Multi-Gap-Buffer?
XDME would keep the text in a list of lines and each line would point to a character array with the actual data. When editing, the characters would be copied into an edit buffer, the changes made and after the edit, the changed characters would be copied back, allocation a new character array if necessary.
This worked, it was a simple design but it had a flaw: it didn’t scale. The length of a line was limited to the max size of the edit buffer and loading a huge file took ages because each line was read, analyzed, memory was allocated … you get the idea.
So I wanted to make it better. Much better. I’d start with reading the file into a big buffer, chopped into evenly sized chunks to make reading both fast and memory efficient (imagine loading a 46MB file into a single memory chunk – after a couple of changes, I’d need to allocate a second 46MB chunk, copy the whole stuff over, etc, needing twice the amount of RAM for a short time).
During the weekend, I mulled the various ideas over, planned, started with a complex red-black tree structure for markers (positions in the text that move when you insert before them). It’s huge, complex. It screams “wrong way!”
So today, I sat back and did what I should have done first: Get some figures. How much does it really cost to copy 4MB of RAM? Make a guess. Here is the code to check:
public static void main (String[] args)
{
long start = System.currentTimeMillis ();
int N = 10000;
for (int i=0; i<N; i++)
{
int[] buffer = new int[1024*1024];
System.arraycopy (buffer, 0, buffer, 1, buffer.length-1);
}
long duration = System.currentTimeMillis () - start;
System.out.println (duration);
System.out.println (duration / N);
}
On my machine, this prints “135223” and “13”. That’s thirteen milliseconds to copy 4MB of RAM. Okay. It’s obviously not worth to spend a second to think about the cost of moving data around in a big block of bytes.
That leaves the memory issue. I would really like to be able to load and edit a 40MB file in a VM which has 64MB heap. Also, I would like to be effective loading a file with 40MB worth of line-feeds as well as a file which contains just a single line with 40MB data in it.
But this simple test has solved one problem for me: I can keep the lines in an ArrayList for fast access and need not worry too much about performance. The actual character data needs to go into a chunked memory structure, though.
Morale: There is no way to tell the performance of a piece of code by looking at it.
25 Most Dangerous Programming Errors
23. January, 2009If you want to improve your l33t [0d|ng skillz, especially keeping script kiddies off your back, here is a list of the 25 most common coding errors: http://www.sans.org/top25errors/
Sorting for Humans: Natural Sort Order
22. January, 2009Kate Rhodes sums it up nicely:
Silly me, I just figured that alphabetical sorting was such a common need (judging by the number of people asking how to do it I’m not wrong either) that I wouldn’t have to write the damn thing. But I didn’t count on the stupid factor. Jesus Christ people. You’re programmers. You’re almost all college graduates and none of you know what the f**k “Alphabetical” means. You should all be ashamed. If any of you are using your language’s default sort algorithm, which is almost guaranteed to be ASCIIbetical (for good reason) to get alphabetical sorting you proceed to the nearest mirror and slap yourself repeatedly before returning to your desks and fixing your unit tests that didn’t catch this problem.
So if you want to sort your lists the right way (instead of the ASCII way), read this.
Holding a Program in One’s Head
18. December, 2008In my perpetual search for brain food for programmers, I’ve found this article: Holding a Program in One’s Head
Writing Testable Code
1. December, 2008Just stumbled over this article: “Writing Testable Code“. Apparently, it’s a set of rules which Google uses. While I’m not a 100% fan of Google, this is something every developer should read and understand.
How To Be Agile
29. November, 2008The article “When Agile Projects Go Bad” got me thinking. I’ve talked to many people about XP and Agile Development and TDD and the usual question is: “How do we make it work?” And the next sentence is: “This won’t work with us because we can’t do this or that.”.
This is a general misconception which comes from the … uh … “great” methodologies which you were taught in school: the waterfall model, the V model, the old dinosaurs. They told you: “You must follow the rules to the letter or doom will rain on your head!” Since you could never follow all the rules, they could easily say “Told you so!” when things didn’t work out.
Agile development is quite different in this respect. First of all, it assumes that you’re an adult. That you have a brain and can actually use it. It also assumes that you want to improve your situation. It also assumes nothing else.
When a company is in trouble, it will call for help. Expensive external advisers will be called, they will think about the situation for a long time (= more money for them). After a while (when the new yacht is in the dry), they will come up with what’s wrong and how to fix it. Did you know that in most companies in trouble, the external advisers will just repeat what they heard form the people working there?
It’s not that people don’t know what’s wrong, it’s just not healthy to mention it … at least if you want to work there. So people walk around, with the anger in their hearts and the fist in the pocket and nothing will happen until someone from the outside comes in and states the obvious. Can’t happen any other way because if it could, you wouldn’t be in this situation in the first place.
Agile Development is similar. It acknowledges that you’re smart and that you know what’s wrong and that you don’t have the power to call in help. What it does is it offers you a set of tools, things that have worked for other people in the past and some of them might apply to you. Maybe all. Probably not. Most likely, you will be able to use one or two. That doesn’t sound like much but the old methodologies are pretty useless if you can’t implement 90%+. Agile is agile. It can bend and twist and fit in your routine.
So you’re thinking about doing TDD. Do you have to ask your boss? No. Do you have to get permission from anyone? No. Do you have to tell anyone? No. Can you do it any time you like, as often as you like, stop at will? Yes. If it doesn’t work for you in your situation, for the current project, then don’t use it. No harm done, nothing gained either.
But if you can use it, every little bit will help. Suddenly, you will find yourself to be able to deliver on time. Your code will work and it will be much more solid than before. You will be able to do more work in less time. People will notice. Your reputation will increase. And eventually, they will be curious: How do you do it? “TDD.” What’s that?
You win.
Be agile. Pick and choose. Pick what you think will work, try it, drop it if it doesn’t deliver. And if it works, try the next thing. Evolve. Become the better you.
Agile is not a silver bullet. It won’t miraculously solve all your issues. You still have to think and be an adult about your work. It’s meant to be that way. I don’t do every Agile practice every day. Sometimes, I don’t even TDD (and I regret every time). But I always return because life is just so much more simple.
Navigating SharePoint Folders With Axis2
26. November, 2008I’ve just written some test code to get a list of items in a SharePoint folder with Apache Axis2 and since this was “not so easy”, I’ll share my insights here.
First, you need Axis2. If you’re using Maven2, put this in your pom.xml:
<dependency>
<groupId>org.apache.axis2</groupId>
<artifactId>axis2-kernel</artifactId>
<version>1.4.1</version>
</dependency>
<dependency>
<groupId>org.apache.axis2</groupId>
<artifactId>axis2-adb</artifactId>
<version>1.4.1</version>
</dependency>
Next stop: Setting up NTLM authorization.
import org.apache.axis2.transport.http.HttpTransportProperties;
import org.apache.commons.httpclient.auth.AuthPolicy;
HttpTransportProperties.Authenticator auth = new
HttpTransportProperties.Authenticator();
auth.setUsername ("username");
auth.setPassword "password");
auth.setDomain ("ntdom");
auth.setHost ("host.domain.com");
List authPrefs = new ArrayList (1);
authPrefs.add (AuthPolicy.NTLM);
auth.setAuthSchemes (authPrefs);
This should be the username/password you’re using to login to the NT domain “ntdom” on the NT domain server “host.domain.com”. Often, this server is the same as the SharePoint server you want to connect to.
If the SharePoint server is somewhere outside your intranet, you may need to specify a proxy:
HttpTransportProperties.ProxyProperties proxyProperties =
new HttpTransportProperties.ProxyProperties();
proxyProperties.setProxyName ("your.proxy.com");
proxyProperties.setProxyPort (8888);
You can get these values from your Internet browser.
If there are several SharePoint “sites” on the server, set site to the relative URL of the site you want to connect to. Otherwise, leave site empty. If you have no idea what I’m talking about, browse the SharePoint server in Internet Explorer. In the location bar, you’ll see an URL like this: https://sp.company.com/projects/demo/Documents2/Forms/AllItems.aspx?RootFolder=%2fprojects%2fdemo%2fDocument2%2f&FolderCTID=&View=%7b18698D80%2dE081%2d4BBE%2d96EB%2d73BA839230B9%7d. Scary, huh? Let’s take it apart:
https:// = the protocol,
sp.company.com = The server name (with domain),
projects/demo = The “site” name
Documents2 = A “list” stored on the site “projects/demo”
/Forms/AllItems.aspx?RootFolder=... is stuff to make IE happy. Ignore it.
So in out example, we have to set site to:
String site = "/projects/demo";
Mind the leading slash!
To verify that this is correct, replace “/Documents2/Forms/” and anything beyond with “/_vti_bin/Lists.asmx?WSDL”. That should return the WSDL definition for this site. Save the result as “sharepoint.wsdl” (File menu, “Save as…”). Install Axis2, open a command prompt in the directory where you saved the WSDL file and run this command (don’t forget to replace the Java package name):
%AXIS2_HOME%binWSDL2Java -uri sharepoint.wsdl -p java.package.name -d adb -s
This will create a “src” directory with the Java package and a single file “ListsStub.java”. Copy it into your Maven2 project.
Now, we can get a list of the lists on the site:
ListsStub lists = new ListsStub
("https://sp.company.com"+site+"/_vti_bin/Lists.asmx");
lists._getServiceClient ().getOptions ()
.setProperty (HTTPConstants.AUTHENTICATE, auth);
If you need a proxy, specify it here:
options.setProperty (HTTPConstants.HTTP_PROTOCOL_VERSION,
HTTPConstants.HEADER_PROTOCOL_10);
options.setProperty (HTTPConstants.PROXY, proxyProperties);
We need to reduce the HTTP protocol version to 1.0 because most proxies don’t allow to send multiple requests over a single connection. If you want to speed things up, you can try to comment out this line but be prepared to see it fail afterwards.
Okay. The plumbing is in place. Now we query the server for the lists it has:
String liste = "Documents2";
String document2ID;
{
ListsStub.GetListCollection req = new ListsStub.GetListCollection();
ListsStub.GetListCollectionResponse res = lists.GetListCollection (req);
displayResult (req, res);
document2ID = getIDByTitle (res, liste);
}
This downloads all lists defined on the server and searches for the one we need. If you’re in doubt what the name of the list might be: Check the bread crumbs in the blue part in the intern explorer. The first two items are the title of the site and the list you’re currently in.
displayResult() is the usual XML dump code:
private void displayResult (GetListCollection req,
GetListCollectionResponse res)
{
System.out.println ("Result OK: "
+res.localGetListCollectionResultTracker);
OMElement root = res.getGetListCollectionResult ()
.getExtraElement ();
dump (System.out, root, 0);
}
private void dump (PrintStream out, OMElement e, int indent)
{
indent(out, indent);
out.print (e.getLocalName ());
for (Iterator iter = e.getAllAttributes (); iter.hasNext (); )
{
OMAttribute attr = (OMAttribute)iter.next ();
out.print (" ");
out.print (attr.getLocalName ());
out.print ("="");
out.print (attr.getAttributeValue ());
out.print (""");
}
out.println ();
for (Iterator iter = e.getChildElements (); iter.hasNext (); )
{
OMElement child = (OMElement)iter.next ();
dump (out, child, indent+1);
}
}
private void indent (PrintStream out, int indent)
{
for (int i=0; i<indent; i++)
out.print (" ");
}
We also need getIDByTitle() to search for the ID of a SparePoint list:
private String getIDByTitle (GetListCollectionResponse res, String title)
{
OMElement root = res.getGetListCollectionResult ().getExtraElement ();
QName qnameTitle = new QName ("Title");
QName qnameID = new QName ("ID");
for (Iterator iter = root.getChildrenWithLocalName ("List"); iter.hasNext (); )
{
OMElement list = (OMElement)iter.next ();
if (title.equals (list.getAttributeValue (qnameTitle)))
return list.getAttributeValue (qnameID);
}
return null;
}
With that, we can finally list the items in a folder:
{
String dir = "folder/subfolder";
ListsStub.GetListItems req
= new ListsStub.GetListItems ();
req.setListName (document2ID);
QueryOptions_type1 query
= new QueryOptions_type1 ();
OMFactory fac = OMAbstractFactory.getOMFactory();
OMElement root = fac.createOMElement (
new QName("", "QueryOptions"));
query.setExtraElement (root);
OMElement folder = fac.createOMElement (
new QName("", "Folder"));
root.addChild (folder);
folder.setText (liste+"/"+dir); // <--!!
req.setQueryOptions (query);
GetListItemsResponse res = lists.GetListItems (req);
displayResult (req, res);
}
The important bits here are: To list the items in a folder, you must include the name of the list in the “Folder” element! For reference, this is the XML which actually sent to the server:
<?xml version='1.0' encoding='UTF-8'?>
<soapenv:Envelope xmlns:soapenv="http://www.w3.org/2003/05/soap-envelope">
<soapenv:Body>
<ns1:GetListItems xmlns:ns1="http://schemas.microsoft.com/sharepoint/soap/">
<ns1:listName>{12AF2346-CCA1-486D-BE3C-82223DEC3F42}</ns1:listName>
<ns1:queryOptions>
<QueryOptions>
<Folder>Documents2/folder/subfolder</Folder>
</QueryOptions>
</ns1:queryOptions>
</ns1:GetListItems>
</soapenv:Body>
</soapenv:Envelope>
If the folder name is not correct, you’ll get a list of all files and folders that the SharePoint server can find anywhere. The folder names can be found in the bread crumbs. The first two items are the site and the list name, respectively, followed by the folder names.
The last missing piece is displayResult() for the items:
private void displayResult (GetListItems req,
GetListItemsResponse res)
{
System.out.println ("Result OK: "
+res.localGetListItemsResultTracker);
OMElement root = res.getGetListItemsResult ()
.getExtraElement ();
dump (System.out, root, 0);
}
If you run this code and you see the exception “unable to find valid certification path to requested target”, this article will help.
If the SharePoint server returns an error, you’ll see “detail unsupported element in SOAPFault element”. I haven’t found a way to work around this bug in Axis2. Try to set the log level of “org.apache.axis2” to “DEBUG” and you’ll see what the SharePoint server sent back (not that it will help in most of the cases …)
Links: GetListItems on MSDN, How to configure Axis2 to support Basic, NTLM and Proxy authentication?, Java to SharePoint Integration – Part I (old, for Java 1.4)
Good luck!
Stuck? Ask Stack Overflow
19. November, 2008Stuck with a hard programming problem? Just solved an impossible problem and want to show the world your genius? Don’t know how to solve a problem with your favorite OS or programming language? Check out stackoverflow.com.
Testing the Impossible: Rules of Thumb
19. November, 2008When people say “we can’t test that”, they usually mean “… with a reasonable effort”. They say “we can’t test that because it’s using a database” or “we can’t test the layout of the UI” or “to test this, we need information which is buried in private fields of that class”.
And they are always wrong. You can test everything. Usually with a reasonable effort. But often, you need to take a step back and do the unusual. Some examples.
So your app is pumping lots of data into a database. You can’t test the database. You’d need to scrap it for every test run and build it from scratch which would take hours or at least ages. Okay. Don’t test the database. Test how you use it. You’re not looking for bugs in the database, you’re looking for bugs in your code. Saying “but some bugs might get away” is just a lame excuse.
Here is what you need to do: Identify independent objects (which need no other objects stored in the database). Write tests for those. Put the test data for them in an in-memory database. HSQLDB and Derby are your friends. If you must, use your production database but make the schema configurable. Scrap the tables before the test and load them from clean template tables.
So you need some really spiffy SQL extensions? Put them in an isolated place and test them without everything else against the real database. You need to test that searching a huge amount of data works? Put that data in a static test database. Switch database connections during the tests. Can’t? Make that damn connection provider configurable at runtime! Can’t? Sure you can. If everything else fails, get the source with JAD, compile that into an independent jar and force that as the first thing into the classpath when you run your tests. Use a custom classloader if you must.
While this is not perfect, it will allow you to learn how to test. How to test your work. Testing is always different just like every program is different. Allow yourself to make mistakes and to learn from them. Tackle the harder problems after the easier ones. Make the tests help you learn.
So you have this very complex user interface. Which you can’t test. Let alone starting the app takes ten minutes and the UI changes all the time and … Okay. Stop the whining. Your program is running on a computer and for same inputs, a computer should return the same outputs, right? Or did you just build a big random number generator? Something to challenge the Infinite Improbability Drive? No? Then you can test it. Follow me.
First, cut the code that does something from the code that connects said code to the UI. As a first simple step, we’ll just assume that pressing a button will actually invoke your method. If this fails for some reason, that reason can’t be very hard to find, so we can safely ignore these simple bugs for now.
After this change, you have the code that does stuff at the scruff. Now, you can write tests for it. Reduce entanglement. Keep separate issues separate. A friend of mine builds all his code around a central event service. Service providers register themselves and other parts of the code send events to do stuff. It costs a bit performance but it makes testing as easy as overwriting an existing service provider with a mock up.
Your software needs an insanely complex remote server? How about replacing this with a small proxy that always returns the same answers? Or at least fakes something that looks close enough to a real answer to make your code work (or fail when you’re testing the error handling).
And if you need data that some stubborn object won’t reveal, use the source, Luke (download the source and edit the offender to make the field public, remove “final” from all files, add a getter or make it protected and extend the class in the tests). If everything else fails, turn to java.lang.reflect.Field.setAccessible(true).
If you’re using C/C++, always invoke methods via a trampoline: Put a pointer somewhere which contains the function to call and always use that pointer instead of the real function. Use header files and macros so no human can tell the difference. In your tests, bend those pointers. The Amiga did it in 1985. #ifdef is your friend.
If you’re using some other language, put the test code in comments and have a self-written preprocessor create two versions that you can compile and run.
If all else fails, switch to Python.
Posted by digulla