Torsten Curdt's weblog

Controlling the photosphere

Photosynth is probably the coolest thing I have seen in a while. I would love to know how they gather all these information and construct a 3D image from it. Just absolutely amazing. Too bad they got bought by Microsoft.

29. June 2007 | Comments | photography

TextMate writing meta data

For some reason some builds where failing for me as soon as I edited files with TextMate. Turns out TextMate writes some meta data into file attributes (or hidden files). Searching the manual I found that you can disable this feature with

defaults write com.macromates.textmate OakDocumentDisableFSMetaData 1

29. June 2007 | Comments | osx, textmate

Bye bye server4you – a review

Soon the contract for my dedicated server will run out. I was sharing a server from server4you.de with a few people. But more and more people dropped out and since we could not find people to fill the spot we decided to quit and move somewhere else. Well, that and the incredibly awful service. (Too many stories to tell!) Just to give some examples: When we got the server we wanted to run Debian instead of SuSE. The question whether they could just pop in a CD that we prepare was just refused without giving a reason. So we ended up installing Debian from the swap partition. Yay! But a much worse story is that hey refused to reboot my server when I was down in Australia. I didn’t have the password to the server admin at hand (I left the records in Germany and my parents where on vacation). Due to a special number you cannot even reach the support hotline from abroad. Somehow I worked out a different number and finally talked to them on the phone. So imagine – my server was down. No access to console. No access to the server admin website. I was explaining them the situation. I was able to give ALL my details. Customer number, of course even bank details etc. They still refused to reboot the server. They asked me to fax them a written request for a new admin password. As it was Saturday that would not be processed before Monday. Then they wanted to send me new credentials to my home address. That would have been at least 5 days downtime. The best thing – due to the time difference I was calling 20 minutes before the end of their business day. You know what? They just hung up on me! “Sorry, Sir I cannot help you”. I was almost getting mad. After some “meditation” I worked out the password and was able to reboot the server myself. So the downtime was just half a day. My complaint to the management was left unanswered. After they even f..cked up the billing several times I am soooo happy I finished moving all my service to a different provider yesterday. If you consider server4you – just don’t. They might have a competitive pricing – but it’s just so not worth it.

29. June 2007 | Comments | server

Google Scalability Conference Report

While the travel to Seattle was horrible (missed my connection and got stuck at Washington Dullas for one night. Mental note: never ever Lufthansa again!) the conference and city after-all was quite nice. Not that I had much time for exploring Seattle – but it looked beautiful while I was hasting through my too-see list in a few hours. (So green!) I’ve not even had the time to meet up with some Apache folks over there. (Sorry, Henry!) The conference itself was well organized (what else would you expect from Google) and covered some quite interesting topics. Unfortunately it was not as technical as one would have hoped. Still there were some lessons learned that surely should be kept in mind while designing large scale systems. The session are supposed to be available on google video/youtube at some stage. So far I could only find one of them.

The sessions from Google itself where mostly introductions to MapReduce and BigTable. They were interesting but no big surprises. They did even talk briefly about hadoop as an open source implementation. Unfortunately there is no comparison between hadoop and the Google in-house implementation yet …though they agreed that it might be interesting to see the results. A little annoying was that every remotely technical question was answered with “Sorry, can’t talk about it”. What I found most amazing though is that according to them all Google developers have access to the production machines.

The session about the Lustre File System was quite interesting. Turns out it is one of the top 5 clustered file systems. It’s full posix compliant and supports cross-site deployments. People run it in highly critical installations. Throughput they achieve is quite impressive. Setting up lustre properly can take some time though. That’s where the company behind lustre provides consulting for.

A session about test selection talked about scaling with the number of test that gets accumulated over the releases. Running all test for a release will take longer and longer with the increasing number of tests. In order minimize risk for emergency releases they developed a technique to statistically map code changes to testcases and requirements. So based on testing history they can then select certain tests and still are quite confident about the QA status. Not sure I really buy into this though. ;)

For me the most interesting session was indeed the session from the Amazon guys about infinite data storage. They emphasized on that if something can go wrong – it will go wrong. And one should plan for that. According to them you only truly scale with more data centers across the world. And you gotta be prepared what happens if a data center just dies! Not just is unreachable – but dies. They had a couple of approaches until they finally arrived where they are now. From mainframes over distributed databases. It all just didn’t work. Especially they were bashing distributed databases. According to them it just happens that nodes run out of sync. For some bizarre reasons – but they do. And then it gets messy. Two-phase-commit was almost a “don’t say that word” for them. They came up with their own system as they believe there was (and still is) no product one can buy or open sources that fits their needs yet. Their contract is that every page (even with one data center down) should be delivered within less than 100ms. A machine or even a data center down should not even be noticeable to the user. While reads are easier to scale writes are not. Still they require their writes (e.g. to the shopping card) to return without any blocking. Their call was that versioning is the key to true scalability in this area. A quorum based process makes sure nodes get synchronized properly across the cluster. On application level they handle merge conflicts if there really are. They have nothing what one would call a master. Any machine can go down – at any time. Load is balanced and data partitioned via consistent hashing. They did not go down the low-end machine route like Google did – but no big investments into machines either. Again developers have access to the production environment and are responsible for the contracts (like the 100ms). Definitely worth a watch once it’s up on Google video.

Unfortunately I missed the session from the YouTube guys. So I hope the rest of the sessions will be online soon too. I’ll give an update here once I found out.

There are a couple of nice write-ups of the conference as mentioned here.

Update: Sylvain found them. Definitely worth watching the YouTube one. If you don’t know what MapReduce/BigTable is about also check out the Google talks.

MapReduce, BigTable, and Other Distributed System Abstractions for Handling Large Datasets (Google)
Scaling Google for every user (Google)
SCTP’s Reliability and Fault Tolerance (University of British Columbia)
Using MapReduce on Large Geographic Datasets (Google)
YouTube Scalability (YouTube)
Building a Scalable Resource Management System for Grid Computing (Platform Computing Corp.)
Lessons in Building Scalable Systems (Google)
Lustre File System (Lustre)
Challenges in Building an Infinite Scalable Datastore (Amazon)
Scalable Test Selection Using Source Code Deltas (Symantec)
VeriSign’s Global DNS Infrastructure (Verisign)

27. June 2007 | Comments | conferences, google, scalability

Scalability Conference in Seattle

Today I am heading off for the Google Scalability Conference in Seattle (Bellevue). It’s not really around the corner but I am really looking forward to it. The agenda of the conference sounds quite interesting and might give some new perspectives on the technical challenges we are facing at Joost. But it’s also interesting to visit this city from a far more personal point of view. It’s incredible how many bands that influenced my taste of music are/were from Seattle. Just one word – SubPop.

If you have suggestion on what to see or want to meet up – let me know. I am flying back on Sunday.

21. June 2007 | Comments | conferences, google, scalability

Torsten Curdt’s weblog

Controlling the photosphere

TextMate writing meta data

Bye bye server4you – a review

Google Scalability Conference Report

Scalability Conference in Seattle