#63 - 26 Feb 2021

Final step in getting ready to hire - the job ad; ARM vs x86 for bwa-mem on AWS; Experiments and your team; Software leadership communities; Research s/w can be a tangled mess, and that's ok

Hi, all!

Thanks for your responses to last newsletter. Readers who responded were broadly pretty happy with the all-in-one newsletter, even if gets a little long; topics that were of interest for future write-ups were (in order of number of votes):

  • Performance management
  • Goal setting for managers
  • Strategy
  • Product management
  • Peer mentorship
  • Levelling & promotion in research computing

I appreciate all feedback; please don’t hesitate to email me with any questions, comments, suggestions or other - just hit reply if you get this by email, or email me at jonathan@researchcomputingteams.org.

Now for the last article on getting ready to start hiring … or skip to the roundup.

Once you have an internal job description, prioritized list of requirements, know how to evaluate against the requirements, you’re ready to start working on the job ad - which is different from the job description.

In research we’re used to thinking of jobs like postdoc or faculty positions as rare; so you just post an ad on wherever and candidates will do the work to figure out what the job listing means, and start preparing their applications.

Computing jobs… are not like that. There are roughly 1.7 bajillion computing jobs posted at this very moment, and your job is just one. Writing a good job ad, paying to put it somewhere, and describing in some detail what the process will be once the candidate applies is more or less the bare minimum to make sure you get a reasonable number and cross-section of candidates.

The job requirements, and the job description as a whole, is an internal document. It should be available to candidates, and you’ll use what you’ve determined and written up in the job ad, but the job ad is an entirely different document. It’s first and foremost an advertisement; you are selling your job to the candidates, demonstrating why they’d want to find out more about your job and your team and how it might be relevant to them.

It’s possible that your institution may have very rigid rules about how jobs get posted on their website; there may be a lot of boilerplate, or it may have to be precisely the formal HR-specified job description. The title of the web page may have to be exactly the job title, and in a lot of our institutions, those titles are utterly meaningless (quick, what’s the difference between “Technical Analyst”, “Research Associate II” and “Application Development Specialist”, and why should a candidate should be searching for one over the other?). I get it; my own current institution has just appalling HR-mandated job ads.

It’s ok though, because none of this stops you from posting your own good job ads elsewhere that direct people to the incredibly uninspiring institutional job posting. Most institutions will allow you to do have different job ads elsewhere as long as all applications go through the god-awful HR website and job ad.

You can pay modest amounts of money to get a lot of reach on sites like Github or StackOverflow or any of a number of technical job boards like Dice or Indeed. There are also specific places to hire for research computing staff. If you’re hiring for anything vaguely related to software development US-RSE or the UK Society of RSE will take your add for free; if you’re looking especially for life sciences expertise like bioinformaticians, [jobRxiv](https://jobrxiv.org) or Bioinformatics.ca may be of interest. If you’re hiring managers, well, I’ll share it with our community. And you can also just post a decent job ad to your team’s website and circulate it on twitter and elsewhere.

So what should go in your job advertisement? While you’re certainly going to have a listing of what you require for a candidate for this position, that will come a little later in the ad. First off, you’re going to have a very high-level view of who you’re looking for, which you already have a good sense of.

Then you’re going to want a list of things that are plusses of the job for the person you’re looking for. If you’re in research computing, you’re almost certainly supporting cutting-edge research in some field. The person you’re looking for loves that, mention it! Does it end up having real-world impact? Does the job come with (possibly optional) high profile in international collaborations? Opportunity to work with really cool technologies? Present at conferences, once that’s a thing again? What will the candidate get to do every day that may be exciting to the person with the capabilities you’re looking for, that distinguishes this job from making twice as much improving web ad click-through rates?

Ideally, there should be a description of what the hiring process should look like. That can be part of the job ad, or it can be something that’s sent to candidates once they apply. It helps clarify the process for you and your team members, and gives the candidate some indication that your team is professionally run and that their carefully honed application isn’t just swirling into some black hole.

We’ve had a number of articles in the roundup about making sure the job ad doesn’t turn away candidates. There’s tools like Gender Decoder that can check your ads for exclusionary language; it’s a very blunt instrument, just based on word lists, but tools of that sort can make sure you’re not saying anything unintentionally.

Another big problem is that many people from groups that are not-overrepresented in technology fields will self-select out if they don’t meet a couple of your requirements; hopefully we’ve reduced that risk by being pretty focussed on only having requirements we actually care about, so that the requirements we list are actually requirements and so candidates don’t have to guess as to exactly how required they are.

Another potential turn off is having overly swashbuckling descriptions (“disruptive!” “ninja!” “rockstar!”), but that’s less often a problem in our field. Since we presumably care in our teams about respectful collaboration, thoughtful inquiry, and learning, including those in our job ads as plusses will help our job ad reach as many candidates as possible, including those who have left jobs in the past because they weren’t respectful, thoughtful, or good learning experiences.

… and that’s more or less it! You’re now ready to start the hiring process. Good luck!

And now on to the roundup.

Managing Teams

Leading your engineering team with ‘experiments’ not ‘processes’ - Cate Huston, LeadDev

One of the recurring themes of this newsletter is that those of us who have worked in research for a while have at our disposal the advanced skills needed to manage teams and projects to support research, but that no one taught us the basics.

Sometimes, the basics just mean applying the same rigor and processes to our new work that we did in our old: lab notebooks become one-on-one notes; thinking about the complex systems we were modelling becomes thinking about the complex people systems we’re part of; and hypothesis generation and experimentation in our work becomes hypothesis generation and experimentation about our work.

Huston’s article suggests reframing changes in team processes as experiments

Now, instead of ‘process changes’ I talk about ‘team experiments’.

and that has several advantages:

  • Experiments dispel the fear of the process monster
  • Experiments improve buy-in
  • Experiments reframe solving problems as learning

We’ve gone through a lot of experiments in our team. In some, the hypothesis ended up being soundly rejected and we moved away from the experiment, but others have taken on a life of their own and changed how we worked for years.

In some ways, the experiments that we roll back are even bigger successes for two reasons - we’ve learned something, and the team members see that their feedback is taken seriously when we try something that doesn’t work for them.

By the way - I often reference a transition from a research career to a career managing research computing teams. I hope I’m not inadvertently suggesting that this is the best, or only, career path to get to this work! It’s just a common one, and the one I know the most about, so it tends to be what I use for context.

Managing Your Own Career

Software Engineering Leadership Communities - Pat Kua

It can be lonely being a manager, especially these days when there’s a lot fewer opportunities to communicate with peers across the organization (and for some of us in academia, even within the organization there are precious few peers doing the sort of work we are doing). There are a few online management communities where people can ask for advice, take part in conversations about best practices, etc. Kua lists four slack communities:

  • CTO Craft Community
  • Engineering Managers Slack Group
  • LeadDev Slack
  • Rands Leadership Community

I’m part of Rands (which is very active), LeadDev (which isn’t), and am going to try Engineering Managers. Rands is great, but some of the issues we face in research computing are quite different from those found in industry and so there’s context missing. When our own community with this newsletter gets larger, would it be worth trying to create some channels on one of those communities, or put together something of our own? What would be useful to you with your team and your questions?

Product Management and Working with Research Communities

Ten simple rules for starting (and sustaining) an academic data science initiative - Micaela S. Parker, Arlyn E. Burgess, Philip E. Bourne, PLOS Computational Biology

Many research computing centres are trying to figure out how to launch or scale up a data science core facility or research institute. Creating anything new within an organization is a challenge, even when the winds are in your favour. Parker, Burgess, and Bourne offer some very sage advice on not just starting up a data science effort in particular, but creating something new within an organization more broadly.

Some key points made are:

  • Don’t try to own everything
  • Leverage champions
  • Establish a set of guiding principles
  • Focus on interdisciplinary, but don’t overdilute

All of these are about having a very clear understanding of what you want to achieve, the organizational environment and which teams have skills or capabilities that can be complementary to that and building on the enthusiasm of others. Another key point:

  • Recognize and elevate data, software, and workflow contributions

As you’re creating the thing, you need to make sure that the incentives within your piece of the organization don’t work against what you’re trying to build. A data science facility within a team that doesn’t consider data, software or workflow work “real science” is going to have a hard time retaining people.

NAG Delivers Machine Learning Guidance via New Azure HPC + AI Collaboration Centre - Doug Black, inside HPC

This is an interesting and natural move, with NAG already a trusted name in both industry and, in some jurisdictions, academic research computing.

The reason I mention it here is that I don’t think it’s fully sunk in yet how much of a potential competitor these sorts of partnerships are to in-house acadeic research computing teams post-pandemic. When the world starts returning to normal, VPRs and department heads are going to realize that remote research computing support has worked quite well. And if teams don’t need to be on campus, do they need to be campus employees?

Let’s say that a major project, well-suited to a cloud implementation for whatever reason, was to come up in your academic institution, and let’s say your team would normally be in the conversation for supporting it. Do you have a good reason why the researcher should approach your team rather than NAG? Is it a better reason than “we’re cheaper”? Because for a sufficiently high-profile project, being the lowest bidder isn’t going to be enough.

Research Software Development

Research software code is likely to remain a tangled mess - Derek Jones, The Shape of Code

This paper, and its hacker news comments, made the rounds this week. The tone is somewhat exasperated and unhopeful, but I honestly don’t think it’s an especially negative article.

Regular readers will know the view of your correspondent is that research software is usefully seen as a technology, and that it’s maturity can be thought of as marching up a technological readiness ladder - how ready the technology is to be used in a widespread manner.

The vast majority of research software is at a proof of concept or prototype stage. Enough to demonstrate that an approach does or doesn’t work, and that’s it. It’s not intended to be the basis of something grander. Followup work shouldn’t try to build a production-strength tool on top of it, but take experiment for what it was - a piece of data saying this approach could work - and build a higher-rung implementation from scratch.

And honestly, most of the time even that won’t happen. The approach didn’t turn out to be as successful as initially hoped, or it was a success but not one anyone cares about. Most open-sourced software, research or otherwise, just fades away into obscurity. From the article, Jones writes:

I suspect that the lifetime of source code used in research is just as short and lonely as it is in other domains. One study of 214 packages associated with papers published between 2001-2015 found that 73% had not been updated since publication.

So yes, the code may be hard to use and understand, but so what?

I think people have an unrealistic expectation that research code should just build out of the box

and having it look prettier or be robust may not be a particularly useful activity for these projects:

I would argue that a more useful investment would be in testing that the software behaves as expected

The one thing I disagree with is that Jones attributes the reason for the prototype-level code is that software writing is a low status activity for researchers academia. I disagree with the causality of the statement; I think it’s flipped. Software writing is a low status activity because prototype-level code is perfectly adequate and in fact the appropriate level for initial research. “Productionizing” software for other researchers to use hasn’t found an adequate home in the research ecosystem yet.

Production quality code of proven methods isn’t a research output, it’s a research input, and we haven’t figure out a good way to fund or a good organization home for shared digital research inputs yet.

The State Of ROCm For HPC In Early 2021 With CUDA Porting Via HIP, Rewriting With OpenMP - Michael Larabel, Phoronix

Larabel reports on a FOSDEM talk by Georgios Markomanolis at CSC (slides here) about preparation for the upcoming AMD-powered LUMI system, and CSCs experiments with porting CUDA to AMD via source-code based translation (!) with the HIP compilers, as well as attempts to use OpenMP GPU offload.

The HIP translation seemed to go pretty well, although it isn’t fully automatic especially with Fortran:

With Fortran code, Hipfort is necessary as as interface library for the GPU kernel and more manual porting compared to the automatic translation. But with one test case at least under their HIP version they found it to be 30% faster, but part of that at least may also come down to compiler stack differences, as noted.

On the other hand, OpenMP offload doesn’t seem to be quite ready yet:

LUMI researchers are also exploring AMD’s OpenMP device offloading support that was recently upstreamed in LLVM and continues being developed in the downstream “AOMP” project. So far they have found AOMP to have some performance issues but are expecting it to be improved by the time LUMI is deployed. They expect HIP will ultimately perform better than using OpenMP offloading but may use OpenMP for Fortran or other complicated codebases.

Estimating your way to success - Rod Begbie, LeadDev

Estimating gets a bad rep because our estimates… aren’t very good. The future isn’t knowable! But Begbie reminds us that the purpose of estimation isn’t to get perfect duration predictions but to structure initial conversations about what is to be done and what needs to be done to get there; and then to learn from the estimates to do better the next time.

Begbie’s estimation rules are to keep tasks estimated duration between a half and two days - larger tasks should be broken down, smaller tasks should be aggregated.

Research Computing Systems

With Supercheck21 Symposium, NERSC Opens International Checkpoint/Restart Dialogue - HPCWire

I’m somewhat chagrinned to say I didn’t even know that this checkpointing symposium was even a thing. The HPC Wire article gives an overview of the symposium, including mentioning tools like MANA and DMTCP,

Emerging Data & Infrastructure Tools

A generalized approach to benchmarking genomics workloads in the cloud: Running the BWA read aligner on Graviton2 - Mark Schreiber and Ankit Malhotra, AWS Public Sector Blog

Even if you’re not moving your workflows to the cloud, it can be an excellent place to play with different kinds of infrastructure. In this post, Schreiber and Malhotra go through in great detail their benchmarking with BWA-MEM, a ubiquitous tool in genomic bioinformatics.

Looking at both time-to-solution and cost-to-solution for an alignment run (on some GiaB reads, for those in the field), they look at the effects of instance type, RAM size, availability of local instance storage, and architecture - x86_64 vs ARM.

It’s well worth a read if this is your field, and a good overview of what careful cost benchmarking looks like even if you’re not. But if you want a single takeaway, it’s that for this memory-bandwidth and integer-comparison heavy job, the graviton nodes handily beat their x86 counterparts on both metrics.

ARM vs x86 instance types for BWA-MEM benchamark

Ctrl IQ emerges to drive orchestration into the cloud - Michael Vizard, Venture Beat
The Founder of CentOS is Building the Next Platform - Timothy Prickett Morgan, The Next Platform

One of the recurrig themes of this newsletter is that production-quality research software development is much closer to industry best practices (unit testing, CI/CD pipelines, packaging, documentation) than production research computing systems operation is. It seems like Gregory Kurtzer, of CentOS and Singularity fame, aims to help change that.

Vizard and Prickett offer an overview of the new effort. Kurtzer aims to bring more flexible, DevOps/SRE-like operations to traditional HPC shops, using Singularity and Warewulf but also a HPC-friendly Kubernetes distribution and a new orchestration tool. Kurtzer’s Ctrl IQ also plan to offer these services in the commercial cloud.

There’s not enough in these initial looks to give much of a sense of whether the new effort is likely to succeed, but it’ll be interesting to keep an eye on.

Calls for Papers

Eurographics Symposium on Parallel Graphics and Visualization - 14 June, Zurich, Abstracts due March 5, papers due March 12

Topics of interest for EGPGV include Parallel graphics, Rendering of very large data sets, Parallel visualizaiton and analyitics, and Processing of large data sets for visualization or analytics. climate research, and HPC in education.

ISC 21 - 24 June - 2 July, Posters due 3 or 10 March, PhD Forum talks due 10 March

The call for papers has closed, but poster sessions remain open; project posters are due the third, research posters the 10th.


Supercomputing Asia - 2-4 March, $50.

Events include a OneAPI workshop, talks about HPC-AI in biomed, industry applications, quantum computing,

A new distributed data analysis framework for better scientific collaborations - Sommer et al, SORSE talk, 3 March 10:30-10:45 UTC

From the abstract:

In our talk, we present a new and innovative remote procedure call (RPC) framework. We focus on a distributed setup, where project partners do not necessarily work at the same institute, and do not have access to each others resources. We present an application programming interface (API) developed in Python that enables scientists to collaboratively explore and analyze sets of distributed data.

Little Minions in Archaeology - an open space for RSE software and small scripts in digital archaeology - Florian Thiery, Ronald Visser, Moritz Mennenga, SORSE talk, 3 March 10:45-11:00 UTC

In our daily work, small self-made scripts, home-grown small applications and small hardware devices significantly help us to get work done. These little helpers - “little minions” - often reduce our workload or optimise our workflows, although they are not often presented to the outside world and the research community. […] We have to promote these little minions in the RSE community and make them visible.


Interested in playing with RISC-V, but not enough so to buy some hardware? Here’s a docker image of an emulator you can ssh into, run with QEMU.

You’ve probably already seen this, but that’s sort of the point - iceberger, a lovely example of how simple and fun a meaningful interactive visual application can be that nonetheless conveys real scientific insight to counter a common misconception.

Last week’s newsletter (admittedly shorter than the previous few) would weigh at least 875 picograms, according to fileweight and IBM.

If you’ve really want to understand how compiled programs become code executing in memory, some recent links give you the whole process - a two-parter on writing an assembler to generate an ELF file, followed by the detailed readme (and the code) of a hobby project to build an extremely fast linker, and then a 15-part(!) series on writing an executable packer.

Relatedly, separated mainly by 62 years, Booting the IBM 1401: How a 1959 punch-card computer loads a program.

Roapi - a rust-based tool for standing up read-only SQL, REST, and GraphQL APIs atop postgres or mysql databases, or google sheets, airtable tables, CSV files, JSON files, and more.

A deep look at the implementation of thread-level storage.

The physical, on-disk steps involved in performing an SQL transaction.

The rest of the world is discovering what a lot of people in resarch computing have known for a while - remote development has a lot of advantages.

Confidential computing in untrusted environments, making use of secure enclave technology like Intel’s SGX, is getting more and more feasible. This article for a confidential-computing version of go, ego, is a good tutorial for one such approach.

The use of shadow CI jobs to test not just new versions of your code your code but new versions of your build tools