#71 - 23 April 2021

Hi, everyone!

I have a somewhat short newsletter for you this week; exciting things are happening at work with product adoption and hiring, both of which are taking up a lot of time but are unmistakably good. In addition, we’re having increasingly lovely weather, dear friends are getting vaccinated, and actions and decisions made months ago are are finally appearing in outcomes that are beginning to come together in a pleasingly coherent manner. I hope you, yours, and your team are doing equally well.

For now, on to the roundup!

Managing Teams

Don’t hire top talent; hire for weaknesses - Benji Weber
Why you should invest in undervalued people - Joe Emison
We need to talk about your Q3 roadmap - Wherewithall

Two common paths into research computing management - from academic research, or from computing and tech - tend to lead to the same mistake in hiring. In both cases, we’re used to looking for the “smartest”, “best”, “greatest promise”, “top” of the applicants, whether for postdoc positions or new senior devs.

But for most of us, this isn’t the right approach. Maybe you are supervising a collection of individuals who will each do independent work. Mostly though, we hire individuals to build a team. For that we need to hire at least in part for gaps in and weaknesses of the team as much as for strengths of the individuals. Or as Weber says:

Instead of “how can we find the smartest people?” think about “how can we find people who will make our team stronger?”

I don’t agree with everything in Weber’s article - we still do need consistent processes, and to look for did-not-demonstrate-needed-skills. But the basic idea of building the team rather than a portfolio of “top” individuals is a vital one. Some of the basic skills you’ll be hiring for will change over time as your teams gaps and weaknesses shift.

The good news about this approach is that it makes it easier to see areas that can be filled by still-junior people, as advocated for by Emison; in # 68 we talked about the work it does take making for more resilient teams, but also it’s a move in the direction of equity as well as being, frankly, significantly easier to hire undervalued staff.

But the suggestion that we should be hiring for gaps sits a bit uncomfortably with this week’s reminder from the Wherewithall newsletter that, as we heard last week in # 70, there are going to be a lot of people leaving or at least taking extended periods of time off as soon as things start getting back to some kind of normal. The adrenaline has long since worn off, and people are looking for change after an exhausting year.

It’s a bit more work to be prepared to hire for gaps when your list of gaps might change suddenly if people leave. Your “intermediate openstack administrator” or “senior research data curator” or “junior research software developer” positions won’t be cookie-cutter, fungible people. The roles will have a consistent set of base-level requirements to be a successful and productive member of your team but a shifting set of skills and behaviours you need to make your team stronger. Understanding your current team’s strengths and gaps, and the full package of what each of your current team members bring to the team - and documenting all of this! - will help you be ready.

How to have meetings that don’t suck (as much) - Danielle Leong

More and more collaboration occurs asynchronously these days, but meetings are vital for coordinating that collaboration. Meetings are also routinely done really poorly, and academia is (or should be) famous for how poorly they’re done. Whether we’re having a meeting to make a decision, solve a problem, gather input, share information, or point everyone in the same direction, Leong calls out some things that should be crystal clear:

Who is leading this meeting?
Why are we having this meeting?
What is the purpose?
What is the agenda?
What are the action items?

The middle item, “what is the purpose”, is badly under-used. I used to think that having an agenda was enough; but having a really crisply defined purpose, especially for recurring meetings, is in the long term even more important. You can’t evaluate whether a meeting was effective or not unless there’s a goal or purpose in mind. An agenda should serve the purpose, and it often implies the purpose, but having it explicitly stated makes it much easier make a meeting better.

Having a purpose also makes it clear when a meeting should be multiple, smaller meetings. If a meeting turns into a grab-bag of purposes, it should be split up. Leong has a list of suggested (short) lengths of times for many different kinds of meetings.

There’s a lot of other stuff in Leong’s article, including links to other good resources like designing useful meetings.

Product Management and Working with Research Communities

COVID-19: One year on for medical research charities - Association of Medical Research Charities

Another reminder that even with the importance of health research never more salient, charity-driven health research foundations have been pounded by the pandemic, and other sources of funding for research in general are in some danger of retrenching. For many of the researchers we support, it’s going to be a tough year or two to secure funding.

Commission seeks further views on microcredentials - Ben Upton, Research Professional News
The Decline of the Master’s Degree - Alex Usher, Higher Ed Strategy Associates

There was a lot of breathless punditry around MOOCs, and then microcredentials, last decade. Post hype, I think we’ve learned that such things are indeed potentially pretty useful when aimed at people who have already been trained to learn on their own (so more at the graduate level than undergraduate) as a way to pick up new skills.

And I think there’s opportunity here for academic research computing groups. As Upton rights, the EC is looking into microcredentials, to see what role they could play in learning and skills development in the future in Europe. Other governments and organizations elsewhere are starting to take a more nuanced look again, too.

There’s an interesting pairing here with an article from the start of the year. Usher wrights about a potential future decline of the Master’s degree. Universities love to set up professional Master’s degree programs, because they can charge whatever they want, set them up in any of a number of ways, and churn out students with new skills and desirable credentials. There’s nothing wrong with that - everyone’s happy! But the very pliable and anything-goes nature of professional master’s program suggests that the value is in the training and experience, not in the precise structure of the packaging.

Some research computing groups have had success in setting up masters programs in HPC, or Data Science, or something similar. But it’s a big commitment for typically pretty small groups, and it can easily tie up a lot of resources (or get taken over by a CS department or similar). As micro-credentials begin to be taken more seriously, research computing teams - with skills in data analysis and management, systems operations and architecture, and software development - may have something very useful to offer.

Research Software Development

The Initial Preview of GUI app support is now available for the Windows Subsystem for Linux - Craig Loewen, Windows Command Line Blog
Microsoft enables Linux GUI apps on Windows 10 for developers - Tom Warren, The Verge

I find this all very disorienting, having come of computing age during the era when Microsoft was actively trying to kill Linux, but Windows seems to be an increasingly plausible development environment for research software, even for tools that will largely be deployed on Linux systems. In particular, WSL2 now has preview support for Linux GUI apps.

Heterogeneous Processing Requires Data Parallelization: SYCL and DPC++ are a Good Start - James Reinders, The New Stack

Reinders writes in favour of the new SYCL (# 44, # 49) standard and DPC++, an earlier Intel-supported project which supports SYCL and lead on the development of

Reinders writes convincingly of the need for a common programming language for expressing parallel algorithms across GPU/CPU/FPGA/DSP/etc. I’m less convinced that such a model is possible, or that SYCL or DPC++ are it, but certainly Intel has put real resources into DPC++, and FPGA maker Xilinx actively participates in SYCL; there’s certainly real momentum there. Do any readers have experience with either of these two technologies?

Research Computing Systems

Deep Dive into Intel’s Ice Lake Xeon SP Architecture - Timothy Prickett Morgan

Intel has some catching up to do against AMD. The long-awaited Ice Lake architecture is both a “tick” and a “tock”, a process change and a new microarchitecture, and there’s a lot going on here. Though details and benchmarks are still scarce, Morgan gives one of his usual context-rich descriptions of what the new chips are like. There’s been significant improvement in per-core parallelism, with notably higher average numbers of instructions executed per clock cycle (basically the only way to get higher per-core performance for CPU-bound loads these days), and a lot of enhancements for modern data centres. Encryption is a big one, with new instructions for modern encryption methods, full-memory encryption support, and updated secure enclaves; there’s also improved memory and cache performance, particularly better and more predictable latencies and NUMA latencies.

Can You Trust Floating-Point Arithmetic on Apple Silicon? - hoakley

Yes, it turns out you can. This is a blog post is in research systems because I think we’re going to keep seeing increasingly exotic CPUs, and this article is a nice example of using known-hard problems to find out how, if at all, the M1’s floating point math differs from that of say Intel’s on nontrivial calculations. The author took three pathological problems from the Handbook of Floating-Point Arithmetic and ran them on Intel and M1 macs, to get exactly the same answers.

Emerging Data & Infrastructure Tools

Microsoft to Provide World’s Most Powerful Weather & Climate Supercomputer for UK’s Met Office - Oliver Peckham, HPC Wire
Met Office and Microsoft to build climate supercomputer - Cody Godwina, BBC
Microsoft brings Azure supercomputing to UK Met Office - Erin Chapple, Microsoft

Weather service computing is an interesting mix of data integration plus ensembles of modestly large simulations, all with very high SLOs for production runs - the weather forecasts need to be made on time! - along side model development, climate simulations, and joint analyses of past weather data with the simulation predictions.

This is as far as I know the first weather service that’s proposing to partner on its production environment with a cloud provider. Way back in # 6 we talked about ECMWF moving its big-data analysis and data sharing environment into the cloud with the HiDALGO project, but this is quite different. It sounds like the production workhorse will be a dedicated HPE/Cray system, but I can’t tell whether it’ll be essentially hosted in an Azure datacenter (an offering they’ve had for a while) or whether it will be an on-prem hybrid arrangement. Either way, the workhorse system is specialized and has redundancy built in, but sounds like it will be integrated with Azure offerings via Azure Arc and will use the cloudy services for things like data analytics but maybe also for model development or testing production runs.

If any readers know more about this project, I’d be very interested in hearing more!

Not Your Usual Supply Chain Hack: The Codecov Bash Uploader Blunder - Steven J. Vaughan-Nichols, The New Stack
Don’t leak your Docker image’s build secrets - Itamar Turner-Trauring

Every time a new method of distributing software is adopted, we start realizing that we need to be careful about not including anything, like keys or credentials, in the new format that shouldn’t be there.

Vaughan-Nichols what happened with the very nice and useful Codecov service, and how teams and tools that used the bash uploader tool for potentially any time in 2021 likely had all of their environment variables stolen, which in this context probably included CI/CD credentials. This happened because a someone was able to maliciously replace the bash uploader tool, because a credential of the service was leaked in one of their docker images.

As Turner-Trauring points out, that can happen in a couple of non-obvious ways. A credential could be included in a base layer and then “deleted” in a later layer of the image - but OCI images keep those lower layers, and so a credential may appear to be missing but can be extracted. In addition, build arguments are baked into the image, so even if you never copy a credential explicitly into the image, it may be there. Best practice with Docker is to use buildkit and explicit docker secrets which are exposed to the running container but are not saved not inside the image.

Random

Using HTTP range queries to list the files in a remote zip archive without downloading the long whole thing.

You know the rule by now - embedded databases always make the roundup. Mongita is SQLite, for MongoDB.

We talk a lot about legacy codebases here, but as data gets more important we also need to be able to learn how to understand and work with legacy data models. Here’s how to get started with 250 tables and no documentation.

Git-xargs supports running git commands against a list of github repos.

Not a big surprise, but jobs in tech are increasingly being offered as remote-friendly - roughly 1/3 of devops jobs in one sample are being offered as remote - and if we hope to compete we’re going to have to figure out how to do that too (at this point, for many of us the institutional barriers are the problem).

Materials for an image rendering course - the Graphics Codex.

I’ll be adding search capability for newsletter back issues shortly using stork-search which seems pretty nice. if you have a static website with a lot of (say) documentation on it, this is a pretty painless way to generate a speedy full-text search.

Some hard-won lessons from working with very large (billions of rows) databases in PostgreSQL.

COVID has changed a lot. One good thing - we’re never going back to old practices that tacitly enabled researchers who were reluctant to share data. One million coronavirus sequences: popular genome site hits mega milestone.

Written communication is a huge part of what we do in research computing and in managing teams. Coursera has a writing course which people seem quite happy with.

Legate-numpy looks like a very interesting NVIDIA-supported effort to build a drop-in replacement for numpy which supports GPUs but also distributed GPU computing, using the runtime from the PGAS language Legion.

RCT Newsletter