Workstation Resource Utilization Goes Local (Again)

For much of the last 25 years, Computer Scientists have looked for ways to make use of the unused resources of workstation machines. During this time the capacities of machines have greatly increased, as have the number that are available and under-utilized. But while the focus of this work gradually shifted from local area networks to web-scale systems, the last few years have seen a shift back to the small scale. Here’s why.

The Workstation Revolution

In the late eighties, a group of researchers from the University of Wisconsin were trying to parallelize computation over workstation machines.

They observed large collections of machines, connected but under-utilized, and many users whose demand for computation far exceeded their current capacity. If the unused capacity could be harnessed, it provided a greater degree of parallelism than was previously available, and could do so with existing resources.

Their approach worked, and the resulting system, Condor [1], is still used today. Indeed, the two fundamental problems of distributed desktop computing are the same now as they were 25 years ago:

  1. The desktop is run for the benefit of its interactive user, so background processes shouldn’t affect that user’s activities.
  2. The desktop is unreliable, and can become unavailable at any time.

Condor was designed such that computations could be paused, making it possible to minimize impact on user activity during busy periods. It was also designed such that computations could be re-run from a previous checkpoint, ensuring the failure of a machine didn’t critically impact results.

It created a pool of computational capacity which users could tap into, which was perfect for local-area networks, because long running computations could be run on remote machines and paused whenever the machine was in use. Since these machines typically resided within a single organization, administrators could ensure that the machines rarely became unavailable for reasons other than user activity.

The Internet Generation

The nineties saw an explosion in the number of workstations that were available, connected, and unused. With the advent of the World Wide Web, the problem of resource utilization had gone global.

The characteristics of this network were vastly different to the LANs used by Condor. Most importantly, with machines outwith the control of any one organization, they were less reliable than ever. Worse still, results could not be trusted. For example, a user with a home workstation may have had a lot of unused capacity, but unlike the lab machines typically used by Condor, there was no guarantee that this machine would be available for any length of time. Even if it was available, with the machine being at a remote site, we now had to consider that results could be faked or corrupted.

More positively, there were now so many of these machines available that any system able to use a tiny fraction of their resources would be able to harness great computational power. In fact, there were now so many machines available that systems such as Distributed.net [2] and Seti@Home [3] could take advantage of redundant computation.

Both systems broke computations down into relatively small problems, and sent duplicates of these problems to many machines. The problems were small enough that even a short period of activity would allow the remote machine to complete, and the number of duplicate computations was great enough to ensure that (a) some of the machines would eventually finish the computation, and (b) enough of these machines would finish that their results could be compared, ensuring that corrupted results were ignored.

Forging beyond computation, P2P systems such as Napster [4] and BitTorrent [5] allowed people to use the storage capacity of their workstations to share and distribute data. Like Seti@Home did for computation, the key to these systems was that redundant copies of data were stored across many machines, meaning the system’s operation was not dependent on the continued availability of only a few workstations.

Local Again

Until recently, this was as far as these systems came. But the ubiquity of multi-machine home networks and an ever increasing demand for storage has created a new focus — storage on local area networks.

Napster and BitTorrent work well on a large scale, when the sheer number of users makes it unlikely that popular items will be inaccessible, but poorly on a small scale where they provide no guarantees that all data will be backed up.

Workstations and LANs now have the capacity to be used for storage, without affecting the activities of the user of a machine (problem 1). New storage systems are capable of ensuring that there are always enough copies of data to prevent data loss (problem 2).

Companies such as AetherWorks(disclaimer: this is who I work for)[6], AeroFS [7], and BitTorrent Sync [8] are working to make use of this under-utilized storage space.

Why do we do this? Many small companies have storage needs, but no desire for a server (and regulatory compliance issues with the cloud), while many others have one or two servers but no redundancy. The ability to provide backup without additional capital expenditure makes that additional backup or server a much more realistic prospect.

Resource utilization has gone local again. This time the focus is on storage.

Footnotes

[1] Condor: A Hunter of Idle Workstations

[2] Distributed.net

[3] Seti@Home

[4] Napster

[5] BitTorrent

[6] AetherStore

[7] AeroFS

[8] BitTorrent Sync

This is a repost of a blog I wrote over on the AetherWorks Blog earlier this year.

PhD: Perfect Preparation for a Start-up?

In late 2011 I moved from life as a Computer Science PhD student to my current role at AetherWorks. Going from a town of seventeen thousand people to a city of eight million was a big change, but going from a life in academia to life in a start-up was much easier than you might think.

This post is a response to two sentiments that I’ve often heard expressed, which I’ve paraphrased below:

“It must be a major change going from a PhD to the world of work.” – The “transition” problem.

“A PhD doesn’t prepare you for life in the world of work.” – The “relevance” problem.

Clearly I don’t think this is true, but I think it’s surprising how easy my transition was, and how useful my past experience has been[1].

Transition

Is there a chasm between the roles of a CS PhD student and a start-up founder? I don’t believe so – in fact, when writing down the stages of each role, the job descriptions are strikingly similar:

Find the Problem: In the initial stages your focus is on finding a problem. You search for a gap in existing work that is ready to be exploited.

Initial Research: You spend the initial months of your work scoping out the identified area, examining existing work and figuring out what related areas have potential relevance. In this stage, you work to find out where the problems are, and start to think about the best ways to fix them.

Development: Once you have established the problem space, you start developing your idea. This should be fast, because you want to show the initial results of your efforts as quickly as possible. You start developing your MVP.

The earlier you get your work out, the quicker you receive the feedback that is critical to your progress: Where can you take the idea? Has it been done before? Does it need to be refined?

Pivot: At some point in these processes you realize that your initial idea wasn’t quite as original as you thought, or you find a related area and a problem in much greater need of a solution. You pivot.

Titles: While this is happening you keep a healthy disdain for your job title, because it seems to mean very little. When you work with such a small group of people, you have to wear many hats — the developer, system administrator, network engineer, and barista are all the same person[2].

Collaboration: You spend days with a very tightly-knit group of collaborators, working through ideas and combing over documents[3]. When the work is presented one person might get the credit, but it’s a team effort.

Self-Motivation: When you are one of only a few people working on a project, a week of unproductive work can seriously damage a project’s velocity. You don’t have a boss looking over your shoulder, so you need to motivate yourself.

Networking: You don’t want to develop your idea in a vacuum, so you go out and meet others in your field at conferences and meet-ups. You learn the importance of a good elevator pitch.

Relevance

Every PhD is different, but I find it hard to reconcile my experience with the view that it produces students with an intense focus but few transferable skills[4].

There are periods of confined study, but as with most jobs, there are opportunities for personal development. My ‘job’ happened to be the work of a PhD student, but that didn’t prevent me from developing other interests.

I tutored, lectured, and presented where possible. I even designed coffee mugs[5] and indulged inpointless side-projects, all of which produced a well-roundedness that has served me well in a start-up. The notion that a PhD creates people that aren’t prepared for work outside of academia is not an inherent truth. As with any job, it is what you make it.

Primary Difference

The primary difference between a PhD and a start-up is the end goal. In my PhD I created software with the goal of evaluating it and publishing results, so there were bugs and usability issues that I wasn’t focused on fixing. Conversely my current job is to make the most stable and usable product possible, which leads to a much stricter view of development, tighter schedules, and a more stringent QA process. The end goal of a start-up is to create a useful product that people will pay for, whereas the end goal of a PhD is to create something that advances the state of the art.

Some of the skills I developed during my PhD have proved to be incredibly useful for life at a start-up. I improved my time management and learned what it takes to motivate myself to do my best work, even on days when no-one is looking — skills that aren’t as necessary in a deadline-driven undergraduate degree, and which aren’t necessarily easily obtained in an entry-level job. It also made me a better researcher, and a more analytical thinker.

Ultimately, very few jobs can fully prepare you for life at a start-up, but a CS PhD can come surprisingly close.

 


[1] Major disclaimer: what I’m going to describe is and was true for me. I had a very practical Computer Science PhD, I had to write a lot of code, and I did a lot of work on software architectures. This makes for an easier transition than many other types of PhD even within Computer Science.

[2] Arguably the greatest output of my PhD was a talent for making smooth lattes. You won’t find this on my resume.

[3] A PhD is typically thought of as a very solitary endeavor, but it’s easy to forget the countless hours of discussions and input from supervisors and colleagues. Your name is on the document, but it wasn’t created in isolation.

[4] This is a view that I’ve heard from many people, but sadly most of the references I could find online were from PhD student’s themselves [a] [b] [c].

[5] This is the second greatest output of my PhD. Sadly, my most innovative designs were banned by the university’s image guidelines.

This is a repost of a blog I wrote over on the AetherWorks Blog earlier this year.

How do you teach Software Quality?

When I was a teaching assistant I was asked to give a talk on ‘Software Quality’ to our incoming junior class which made me realise one thing – you can’t teach software quality!

For a start, it’s so subjective. The narrator in Zen and the Art of Motorcycle Maintenance says much the same things about creative writing. Quality is difficult to define. If you ask students to rank essays (or programs) from best to worst then they will probably be able to reach a consensus – they have an intuitive understanding that one essay has more quality than another – but it’s much more difficult to identify the parts of the essay that give it quality. Over time, with practise and reading, a student develops their ability to recognize that quality. It is the practise then that helps them develop this understanding and an ability to create it.

To me this seems similar to the food tasters Malcolm Gladwell discusses in Blink. The average person can taste Coke and Pepsi and give their preference for one or the other, but may find it difficult to explain why they have this preference. On the other hand, an experienced food taster is able to identify the multitude of flavours in each drink and categorize them by their taste and strength. The food taster, with their well-developed analytical ability, is able to put words to the feelings and intuition that the untrained person has.

With programming classes, students are lectured on the techniques and design patterns that help to produce quality software, but they can only begin to understand where these techniques should be used (and possibly more importantly, the extent to which they should be used) with experience. It’s almost as easy to write bad code by over-applying design patterns and ‘good practise techniques’ as it is to when not using them at all. There is hardly ever one right answer.

I don’t think it necessarily helps that the focus is on writing code rather than analysing the work of other people. As soon as I started working on group projects I think I developed a far greater appreciation for the quality (lack of) of my own code.

Refactoring as an Afterthought

A more obvious problem relating to software quality is the treatment of refactoring as an added extra. When I write a paper I tend to write out the first draft as a long brain-dump saying everything I think needs to be said. That draft is then constantly refactored until I have a cohesive piece of work. In programming (particularly with University assignments) the first draft is often finished when it meets the general requirements of the task, so there is little immediate need to redraft or refactor the work into a better state.

I’m not sure it’s possible to realise the danger of this until you’ve worked on a larger project where poor quality code can lose you days in debugging and refactoring. This is probably the way most people learn of the need for ‘quality’ software anyway.

What Can You Teach?

Beyond teaching students the design patterns and architectures that can help to improve the structure of their code I think the only thing you can do is emphasize the importance of taking an analytical view of code. The best programmers I’ve worked with are constantly unhappy at the state of their own work; continually looking for ways to improve its clarity and structure. Ultimately you can’t really teach that, only motivate the need for it.

There are a few books I’ve read that help in this respect. I particularly liked:

There are probably many more, but I haven’t read them yet!

Other books are just good at giving you extra ways to think about and structure problems:

  • Design Patterns (Gang of Four or Head First – pick your poison).
  • Extreme Programming Explained. You might not agree with all of it, but principles like incremental design and constant refactoring work for me.
  • Programming Pearls. I started reading this last week, so I can’t give a complete review, but the first few chapters are great examples of clever programming.

My talk ultimately focused on motivating examples, aiming to get across the pain of doing things the wrong way.