#54 - 11 Dec 2020

Employee review questions; starting digital humanities projects; COVID hitting research charities; command-line interface guidelines CentOS -> CentOS Stream

Research Computing Teams Link Roundup, 11 Dec 2020

Hi, everyone!

I don’t have anything of my own to share with you this issue, but it’s been an interesting week in research computing and so there’s lots of nuggets in the link roundup.

As always, if you find anything particularly interesting, or if there are topics you’d like covered, please let me know! Feedback is a gift, and even though I’ve been shamefully slow getting back to a couple of readers this week I really value your thoughts. I enjoy getting email back, even about things that you don’t like or disagree with.

So on to the roundup!

Managing Teams

42 Employee Review Questions Every Manager Should Ask - Fellow.app

There’s always lots of of “annual review” posts up towards the end of the year, but even if you don’t do annual reviews, this is a good set of topics to make sure you raise with your team members periodically - on goals, strengths, what they like best/least about their current role, what new challenges they’d like, how your working relationship with them is going, and how the working relationships within the team are going.

Starting in research computing (bioinformatics) as a non-academic - Mick Watson on Twitter

An important reminder for those of us in research - and especially in academia - that the world of research is intimidating, unfamiliar, and as a result very stressful to team members who weren’t trained in that area. In Watson’s case, he has been enormously successful eventually anyway, but it was a pretty trying experience. All though he doesn’t say it, my guess is that with a little extra mentorship and support, he likely could have flourished a bit earlier and with a lot less anguish.

Managing Your Own Career

Where Are the Career Paths for Staff on Campus? - Lee Skallerup Bessette, Chronicle of Higher Education

In research computing we frequently talk about the lack of career paths for ourselves as managers and - to an even greater degree - our team members as individual contributors. Bessette’s article reminds us that this is an issue at research institutions quite broadly - in traditional support units like HR and finance as well as in newer ones like our own or, at Universities, educational design (a field that is booming with recent interest in virtual and hybrid education). In all of those areas Bessette notes consequences quite familiar to us - salary compression, a feeling of stagnation even if responsibilities are growing, people hopping between institutions or projects even when they’d otherwise be quite happy to remain just to try to get some kind of vertical trajectory.

So the bad news is that the problem is bigger than our units, but the good news is that we may very well have allies in trying to set out career ladders for IT and management tracks.

Product Management and Working with Research Communities

Visualizing Objects, Places, and Spaces: A Digital Project Handbook - Beth Fischer and Hannah Jacob, Wired!, Duke University

Fischer and Jacob are starting off on what looks like a really exciting project, assembling a handbook for starting digital projects in the humanities. It will be interesting to what comes of this effort - and if your team has helped support such a project, they’re seeking contributions.

Cancer Research UK forced into £45m ‘dramatic cut’ to research - Mićo Tatalović, Research Professional News (paywall)

The headline is enough here - those of our researcher colleagues who perform charity-funded research face a bleak couple of years. Such funds depend heavily on donations which are always sensitive to the economy, and don’t bounce back as quickly as the economy does.

Research Software Development

Finding Critical Open-Source Projects - Abhishek Arya, Kim Lewandowski, Dan Lorenc and Julia Ferraioli – Google Open Source

One reason for the push to cite research software is that it’s one of the few ways we have to give credit to the work, and to show that it’s widely used as a justification for funding for improving/maintaining the software. Software citation remains an uphill struggle, especially for software that a research user might not directly interact with (like key libraries). This is a problem in areas other than research, of course - maintenance of crucial pieces of software for internet software is famously under-funded. In this article, the authors describe how google is trying to address this by creating a metric by which to evaluate how critical open source projects are (by tracking activity, issues, etc as well as its use as a dependency) to make these tools importance more visible.

As more and more research software - especially that software which becomes widely used as a dependency - moves to online code repositories, it’s hard not to imagine that this approach would be increasingly feasible in our community, and that we might be able to use such metrics to justify funding.

Command Line Interface Guidelines - Aanand Prasad, Ben Firshman, Carl Tashian, Eva Parish

We create a lot of command line tools in research computing, and we tend to unconsciously mimic common linux tools when doing so - but those were designed a long time ago, and their interfaces were frequently optimized for use in scripts rather than interactive use by people. This set of command line interface guidelines formulates a consistent philosophy for modern use cases, integrates advice from a number of different style guides, has a short and opinionated list of tooling for Go, Python, Node, and Ruby which mostly support their guidelines, and has suggestions for how to distribute the resulting command line tools. It’s a good and not very long read.

How We Built Scalable Spatial Indexing in CockroachDB - Sumeer Bhola, CockroachDB

Geospatial data is a growing part of research computing, and this is a nice overview of how one database project implements spatial indexing to implement efficient spatial queries (things like, “Find all objects in this region”).

This is neat in and of itself, but it’s also a great reminder of the power and transferability of research computing expertise. The same kind of data structure and concept - space filling curves - that helps power geospatial database indexing also helps with certain kinds of adaptive mesh refinement for high-speed computational fluid dynamics, or even some kinds of particle tracking methods. The investments we as team make in deeply understanding one kind of approach often have implications for very different kinds of problems.

Research Computing Systems

CentOS Project shifts focus to CentOS Stream - The CentOS Project
The future of Linux distributions in the age of docker and k8s - Joe Landman

Some big news this week, long feared after the announcement of CentOS stream, that stable CentOS is being replaced. CentOS stream, rather than being a clone of the stable, long-term, supported (but costly) RHEL distribution, is a continuously running-updated distro showcasing what will appear in RHEL. Since the main reason in research computing for the use of CentOS is the stability, this has caused some concern - people wondering what to switch to, and a new bistro being started, rocky, by some of the same original CentOS folks, with the goals of the original CentOS.

Landman’s article is interesting to read in that context. As we start packaging things in terms of applications rather than libraries, research computing (and computing in general’s) relationships to OS distributions is changing - maybe surprisingly slowly, all things considered. In HPC environments it’s long been common already to have multiple alternate versions of all key libraries, compilers etc - essentially entirely parallel “distributions” - built and maintained. Even on personal systems, outside-the-distro package management systems like home/linux brew are commonplace. Linux distributions already exist that are very slim and optimized to be nothing more than a container host. Is that the way forward for shared research computing systems, even those not using containers?

ZFS: You should use mirror vdevs, not RAIDZ - Jim Salter, JRS Systems

Salter offers a deep look into ZFS, virtual devices (vdevs), pools, and RAIDZ, and explains why nowadays mirroring vdevs is generally a better choice than RAIDZ.

Emerging Data & Infrastructure Tools

Don’t Panic: Kubernetes and Docker - Jorge Castro et al., Kubernetes Blog

If this directly affects you, you’ve probably already seen this or related discussions - yes, Kubernetes is dropping support for Docker, and it’s not necessarily a big deal.

Since Docker (or, recently, Singularity) is the main way most of us create and interact with OCI-compliant container images, we tend to conflate that particular runtime and toolset with containers in general. But the runtimes Kubernetes supports changing to exclude Docker may not change anything about your interactions with Kubernetes at all - using Dockerfiles and docker build to make images is fine. On the other hand, local tooling that uses the docker runtime - things like docker ps - to interact with images in the Kubernetes cluster will have to change.

Hewlett Packard Enterprise […] high performance computing […] HPE GreenLake - HPE Press Release

(Good product management tip here, too - if you want an announcement to get some press, make sure people can make puns in the headline, e.g. HPE “floats” HPC-as-a-Service with GreenLake Cloud, HPC Does a Cannonball into HPE’s Greenlake).

Interesting to see still other partners getting into the HPC-as-a-service game, with HPE.

Events: Conferences, Training

Christmas SORSE Event: When Spreadsheets Attack! (and other maths disasters.) - 17 Dec, 14:45 UTC, Free online

From the webpage:

Are you already looking forward to the festive break for some relaxation? Yes, so are we! But just before you close your computer down for possibly the last time in 2020, join us for a lighthearted look at When Spreadsheets Attack! with the hilarious, well-known standup comedian and mathematician, Matt Parker, followed by some tales from the community.

Qwiklabs Kubernetes training - 30 days free if signup by 31 Dec

This Google cloud post lists a number of training opportunities; maybe most relevant for our community is this item:

If you sign up by December 31, you’ll get access to unlimited Kubernetes training and the opportunity to earn Google Cloud skill badges on Qwiklabs at no-cost for 30 days. We recommend you begin with the following quests on Qwiklabs: “Deploy to Kubernetes in Google Cloud” and “Kubernetes Solutions.”

Software Sustainability Institute Collaborations Workshop 2021 - 30 March - 1 Apr, Online, £50 From the website:

The Software Sustainability Institute’s Collaborations Workshop series brings together researchers, developers, innovators, managers, funders, publishers, policy makers, leaders and educators to explore best practices and the future of research software. Collaborations Workshop 2021 (CW21) will take place online from Tuesday, 30 March to Thursday, 1 April 2021.


Speaking of modern command language tooling - fig for visual applications and shortcuts for command line tools looks really interesting.

NAND game - get all the way to building a simple processor using just NAND gates. Good quick crash course to git stash.

Git wip, a git alias to list branches and when you last worked on them, not that any of us ever create a branch and move on and never do anything with it only to be surprised by its existence later.

A reminder you can use sockets directly from the shell. This plus named pipes and process substitution are my favourite underused shell tricks.

As cloud data engineering technologies mature, they’re going to start getting rewritten with performance in mind. Seastar, a C++ framework using concepts that would be familiar to those working on high performance multithreaded research software, has already had success doing just that with Cassandra and Memcached. Here’s a quick intro.

Vim tips for the intermediate user.

An Introductory tutorial for postgrest. Postgrest is an unusually powerful low-code way of spinning a ReST api up atop a (postgres) database.

A nice tutorial use case for github actions for a blog, using issues and pull requests plus github actions to automate some steps for a Hugo blog (and to nudge the author to get those blog post ideas out).

Cute tutorial for federated learning - a topic close to my heart - with Python.

Free pre-production PDF version of a really interesting looking book, High-Dimensional Data Analysis with Low-Dimensional Models: Principles, Computation, and Applications by John Wright and Yi Ma.

Nice debugging story/review of what happened when an Advent of Code website couldn’t keep up with demand. It’s also a really nice review of all of the things that are likely to go wrong with a simple web application, finally concluding with an extremely unlikely culprit.

LinkedIn has some neat-looking School of RSE training materials.