#85 - 30 July 2021

Hi there:

We’re going into a long weekend here in Toronto - the second-to-last one of the summer - and it’s very much needed. We have a number of pretty ambitious efforts we’re working on, and it’s been a long year already. I hope that you and your team are taking care of yourself, and that you in particular as manager or team lead are taking some time off. There’s an article below on the importance - both for you and your team - of you taking some time off, to recharge yourself and give your team the opportunity to step up.

Also, there was some interest in the AWS ARM HPC hackathon that was in the roundup last week. I know that a number of readers are, like me, in the genomics space right now. Let me know if you think you or your team might be interested in participating in a similar week-long hackathon for ARM specifically around genomics; as always, just hit “reply” if you get this in your email, or email jonathan@researchcomputingteams.org if you want to talk about that or about anything that comes up in the newsletter.

And on to the roundup:

Managing Teams

When Do We Actually Need to Meet in Person? - Rae Ringel

In the past 17 months, having to work and communicate in new ways, we’ve learned to be thoughtful in planning how to communicate and work together. With teams starting to be able to meet in person, there’s no need to discard that thoughtfulness! What the right approach for a meeting will be will depend on the the goals and purpose of the meeting.

Here Ringel offers a simple framework for thinking about when a meeting benefits from being in-person. Complex goals and building/maintaining relationships push towards favouring in-person meetings, while simple goals and working on tasks favour hybrid or asynchronous meetings. (Incidentally, those are also the meetings where very strong meeting facilitation skills are the most necessary).

Do we need to meet in person? This image lays some example reasons for a meeting on a spectrum, with conflict resolution being complex and relationship driven at the in-person side, while updates and briefings are at the hybrid- or asychronous-side.

Relatedly, a lot of managers are starting to think of ice-breaker/team-building activities to get people used to working together in person again, particularly when new members have joined the team while it was purely distributed. Lots of people are suggesting games like Zip Zap Boing - what sorts of things have people tried?

Writing Better Job Ads - Eli Weinstock-Herman

This is a nice lengthy post on writing job ads. And given what I see scanning job ads for research computing team managers, the advice is needed!

There’s too much for me to completely summarize, but some key points

A Job Ad is a Landing Page… A job ad is marketing. An advertisement.

I can’t agree with this enough. Even if what you have to post on your institutional jobs website is constrained to have to have all kinds of meaningless boilerplate and a dry list of job requirements - and at universities and hospitals there’s definitely some of that - there’s little to nothing stoping you from posting a job ad elsewhere, on your team’s website or on external job boards. You can direct people to the dry-as-dust “official” posting to apply.

What’s worse, most of the stuff we’re tend to put into job descriptions and job ads are… well:

[…] I’m more and more looking at “5+ years of (skill)” as an intellectually lazy statement. […] I wrote a job ad for a fungible human gear.

God yes. Even if “5 years of C++” (or whatever) was a meaningful measure, like any given 12-month period of experience working with C++ was interchangeable, it’s an input. A person with that laundry list of inputs might, if you’ve done your job well, be able to be a capable team member, but what you care about are the outputs - the results the new team member helps the team achieve. And other combinations of inputs might help the new team member accomplish those things just as well or better.

Weinstock-Herman makes the following suggestions for a process:

Start with the end in mind (always a good focus)
Create the core of the job ad first:
- What will the candidate achieve?
- What are expectations from a team member in this role?
- What are the specific tools/processes in use
- What does the team do, why is it interesting, what’s the impact?
- What does compensation, benefits look like?
Boil it down to a pitch
Work on tone, length, engagement
Test, test, test
Post thoughtfully

We have huge advantages in research for hiring. We’re helping advance the frontier of human knowledge. We’re doing meaningful work, not trying to drive up click rates on advertisements. We offer the possibility of going between multiple quite different projects, learning both new tech and new science along the way, and the possibility of outsized impact. Why do so many of our job ads read like working in our field is a chore, that could easily be done by anyone with 3 years experience in linux and 4 years in “a scripting language”?

Managing Your Own Career

Questions for potential employers - Carter Baxter
My questions for prospective employers (Director/VP roles) - Jacob Kaplan-Moss

We do a lot of discussion of hiring from the hiring manager side of the table in the newsletter, but when thinking of our own career prospects it’s worth considering what we should ask when we’re the candidate, too.

Asking questions about how the position came to be free, the goals of the organization, the goals of the position, what six-month success looks like, how much autonomy the role has, travel requirements - these are all important things to know before you take a job offer.

Out of Office Alert: Managers Need Vacations Too! - Samantha Rae Ayoub, Fellow

It’s important to take time off to recharge, even though as managers we’re often not great at this. It’s a little too easy to convince ourselves that our firm hand on the till is too important to completely let go… and that’s a self-fulfilling prophesy. You’re robbing yourself of needed R&R, and your team members of the chance to step up in your absence, by not completely checking out. And the more often you completely step away, the easier it gets for you and the team

Ayoub goes through ten steps to go through - the key ones to my mind are:

Prep a “While I’m Away” list
Put one person in charge
Ask your team to keep a collaborative set of notes
Turn on your Out of Office Alert
Do not reply to your email or voicemails
Carve out 2 hours in the morning when you get back to get caught up

One really clever suggestion I don’t know that I’ve read before is, in to make that “while I’m away” document a shared writable document and have it not only be a checklist of things to do but somewhere where people keep notes of what was done, what happened at the meeting with Prof X, etc - so you come back to a briefing document to catch you up.

The other really crucial thing is to put one person in charge while you’re away - or at the very least to have a very clear decision making process. Decisions will have to be made in your absence, and the team needs to know how to make them. You can rotate between people, but it should be someone who has a pretty good big-picture view of the work of the team.

Cool Research Computing Projects

CLIMB-COVID: continuous integration supporting decentralised sequencing for SARS-CoV-2 genomic surveillance - Samuel M. Nicholls et al.
How UK Scientists Developed Transformative, HPC-Powered Coronavirus Sequencing System - HPC Wire

The UK has lead the world in sequencing and surveilling the evolution of SaRS-CoV-2, the virus that causes COVID-19; and roughly a quarter of the world’s SaRS-CoV-2 genomics data has passed through the COVID-19 Genomics UK’s (COG-UK) CLIMB-COVID infrastructure, which is described in this paper and HPC Wire article.

While COG-UK’s sequencing efforts are distributed, the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) takes a hub model for integrating the genomic and epidemiological data.

The heart of CLIMB-COVID is Majora, a Django-based application with both a web and command-line based UIs, but they had to also create simple sample-naming schemes, nextflow pipelines, and MQTT messages communicating between pipelines and Majora, as well as online phylogenetics analysis, cluster identification, and visualisation.

The CLIMB-COVID pipeline

Research Software Development

How Herbie Happened - Pavel Panchekha

Herbie is an automatic rewriter of numerical expressions that attempts to find equivalent but more numerically accurate expressions, available for local use but also through a web interface. I don’t think it’s as well known in technical computing circles as it ought to be; I only learned of it as my own use of methods that required careful numerical analysis was winding down, even though it’s been in development since 2013 or so.

Panchekha’s article - an older one but one which is circulating again - gives an overview of the story of Herbie development, beginning when he was a grad student. It originally started with a recognition that not much CS or programming language work was being done on the quite important task of improving numerical accuracy, and so even modest progress would be important. The lessons he took from the Herbie effort are:

Tackle ambitious problems
Know how you work best - he doesn’t work well alone
Good benchmarks guide research
Generating reports from search processes is great for debugging
You never know what will be important
Some small things matter a lot
Don’t submit papers too early
If you keep rewriting something, think deeper
Make a demo

One of the the great things about this article is the bracing honesty with which he writes about false starts and dead ends. Also, the online demo, which was originally meant more just to have something interesting on the project’s web page, was very important both for communicating what Herbie does and getting feedback from users. All in all this is a nice behind-the-scenes writeup of a research computing project.

Research Data Management and Analysis

Life on the diagonal — adventures in 2-D time - Luke Plant

It’s pretty common in research computing to not just manage data but the history of the data - how it’s changed over time. There are solutions like temporal tables in SQL:2011 (or other approaches, lumped together under the term of art “slowly changing dimension”) to be able to view what the data values looked like at some earlier time/version. In addition, there’s now a number of newer “git, for data” solutions which make it easy for people to collaboratively update the data while maintaining history. It’s all very cool stuff, and if those tools match any of your use cases, you should absolutely use them, it’s not something you have to implement yourself any more.

But when the data that’s being updated at different times is itself a timeline, all of this can get kind of hard to think about. Plant walks us through a mental model for thinking of this, two dimensional time - the event time (when something actually happened) and the knowledge time (when it was recorded in the database), and makes an analogy that we experience life on the diagonal of this 2-d time; when something happens and when we learn of them are roughly equivalent in importance.

The fundamental reason that this is hard to think about is that “time” means two things, which is why the introduction of“event time” vs “knowledge time” as terms is very valuable. Incidentally, I’ve had two hours of meetings this month trying to come to a technical solution for a data modelling problem, only for us to realize as we were hanging up on the second meeting that we were using “dataset” to mean two slightly different things and that was causing the problem. Naming things is important!

Postgres Full-Text Search: A Search Engine in a Database - Kat Batuigas

While the biggest story of databases over the past 15 years has been the divergence and specialization into a diverse range of capabilities, the second biggest story has been partial convergence - NoSQL developing partial ACID capabilities and stalwart databases like PostgreSQL and MariaDB developing capability for sophisticated JSON handling and text indexing.

If your use case is principally full-text search you’d be better off with Elasticsearch or moral equivalent, of course, or if all you needed were JSON objects you’d go with MongoDB, but increasingly if you need some full text capability or some unstructured JSON support in something that’s already using a relational database, there’s less and less reason to introduce another data store. In this article, Batuigas walks us through what full-text indexing can and can’t do in Postgres.

Digital Humanities Project Charters and Data Management Plans - Marie Léger-St-Jean

Léger-St-Jean posted on twitter her breakdown of (so far) four different project charters and one data management plan for digital humanities projects, to inform those planning similar documents for other projects.

Emerging Technologies and Practices

biowasm - Robert Aboukhalil
Genome Ribbon - Maria Nattestad, Chen-Shan Chin, Michael C. Schatz

I know I’ve been on a bit of a Web Assembly kick here lately and that many seem odd to readers coming from (say) HPC. But here’s a lovely example of a package of real bioinformatics tools and libraries (biowasm) distributed as Web Assembly packages ready to be run interactively in the browser - and with Web Workers, being able to load files from the local file system. And Genome Ribbon is an early example of the kind of complex applications that can be built this way - visualization of complex genomic rearrangements in the browser, without any of the data ever leaving your computer.

A Linux Kernel Implementation of the Homa Transport Protocol - John Ousterhout, USENIX ATC ’21

TCP/IP is an amazing technological achievement and makes the internet possible. Wide area networks were how the internet began, but individual data centres with tens or hundreds of thousands of nodes very much were not, and TCP isn’t great within a datacenter or large cluster. Google’s described their user-space TCP replacement between services, Snap, which they’ve been using since 2016.

In this paper and slide deck, Ousterhout describes their linux kernel implementation of the Homa protocol, which the developers feel addresses the many failings of TCP within a datacenter:

Connection oriented - high space and time overheads
Stream oriented - but most within-datacentre communications is more like remote procedure calls, and stream oriented approaches cause head-of-line blocking
Fair sharing of bandwidth increases latency for short messages
Sender-driven congestion control requires buffers to detect congestion
In-order packet delivery makes load balancing very difficult

It’s interesting to compare this approach with that of AWS’s Scalable Reliable Datagram, which we covered in #80, particularly the concern with congestion and tail latency. For all workloads tested, all message sizes, and for a variety of other network traffic happening, Homa had one-to-two orders of magnitude advantage for P99 latency, and 2.7-7.5x improvements in median latency. Homa focuses very much on latency, with receiver-driven congestion control and prioritizes messages based on the shortest remaining processing time first.

Calls for Submissions

Several more SC21 workshops have calls:

Fifth International Workshop on Software Correctness for HPC Applications (Correctness 2021) - Papers due 9 Aug
16th Workshop on Workflows in Support of Large-Scale Science (WORKS21) - Papers due 15 Aug
ScalA21: 12th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Papers due 27 Aug
MCHPC’21: Workshop on Memory Centric High Performance Computing - Submissions due 31 Aug

IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT) - 6-9 Dec, Leicester UK, papers due 15 Aug

Topics include Big Data Science, Infrastructure and Platforms, Applications, Visualizaiton, and Trends and Challenges.

8th International Workshop on Large-scale HPC Application Modernization (LHAM) - Abstracts due 27 Aug, papers due 1 Sept

From the call:

The International Workshop on Large-scale HPC Application Modernization offers an opportunity to share practices and experiences of modernizing practical HPC application codes, and also discuss ideas and future directions for supporting the modernization.

Topics include

Programming models, languages and frameworks for facilitating HPC software evolution and refactoring.
Algorithms and implementation methodologies for future-generation computing systems, including manycores and accelerators (GPUs, Xeon Phi, etc).Automatic performance tuning techniques, runtime systems and domain-specific languages for hiding the complexity of underlying system architectures.
Practices and experiences on porting of legacy applications and libraries.

Call for Papers - [Electronics] Special Issue on Program Analysis and Optimizing Compilers for High-Performance Computing - Papers Due 1 Sept

From the call:

The recent technical trend toward extreme heterogeneity in processors, accelerators, memory hierarchies, on-chip interconnect networks, storage, etc., makes current and future computing systems more complex and diverse. This technical trend exposes significant challenges in programming and optimizing applications onto heterogeneous systems. The purpose of this Special Issue is to bring together application developers, compilers and other tool developers, and researchers working on various program analysis and performance optimization techniques for an exchange of experiences and new approaches to achieve performance portability in the era of extremely heterogeneous computing.

Topics include:

Program analysis tools and methodologies to understand program behavior and resource requirements;
Efficient profiling and instrumentation techniques to characterize applications and target systems;
Code generation, translation, transformation, and optimization techniques to achieve performance portability;
Optimizing compiler design, practice, and experience;
Methodologies for performance engineering

Events: Conferences, Training

Software Engineering Challenges and Best Practices for Multi-Institutional Scientific Software Development, Keith Beattie LBNL - 4 Aug, 1pm EDT, Free registration required

Part of the best practices for HPC Software Development Webinar series:

In this webinar we present the challenges faced in leading the development of scientific software across a distributed, multi-institutional team of contributors, and we describe a set of best-practices we have found to be effective in producing impactful and trustworthy scientific software.

National Center for Women & Information Technology - US-RSE DEI-WG Speaker Series - 12 Aug, 4pm ET

This presentation explores why diversity matters to innovation, how implicit biases play out in technical work cultures, and what actions individuals can take to create more inclusive technical cultures. Attendees will learn key features of strategic, research-based approaches to address the biases and barriers that limit diverse participation in computing.

5th EAGE Workshop on High Performance Computing for Upstream - Heterogeneous HPC: Challenges, Current and Future Trends, 6-8 Sept , €175 - 370

“Upstream” here, for those not in the industry, means exploration for oil & gas, but a lot of the talks here have pretty broad HPC applicability - matrix-free optimizaiton, seismic wave simulation, accelerated computing, HPC modernization for cloud, workload management, DPC++, etc.

17th Int’l Workshop on OpenMP - 13-16 Sept, Zoom, Univ of Bristol, £70/£90

The first day is OpenMPCon, focusing on vendors (including LLVM) and updated supports for OpenMP; the next three days focus on using OpenMP (such as a report on an OpenMP Hackathon, or building a portable GPU runtime atop OpenMP) and extending OpenMP (hardware transactional memory, extending the tasking model).

Random

Interested in Digital Signal Processing? Steven W. Smith has a huge book “The Scientist’s and Engineer’s Guide to Digital Signal Processing” available for free on line.

Also available as a pre-production draft of a book: small summaries for big data, covering sketches of large datasets.

A lovely science communication example of buoyancy forces, stability, and ship design with interactive diagrams and illustrations.

An overview of netcat and variants.

121 questions for managers and ICs for one-on-ones.

If you’ve been using google drive for a while, links shared earlier than 2017 will break shortly.

LaTeX and GFM Markdown table generators.

A deep dive into how python imports work.

A more useful “how to think about Git” tutorial that tries to actually convey meaning rather than just trying to sound smart.

In praise of “baking data in” to application or software deployments.

ConnectorX is a new package that loads DB data into pandas dataframes quickly and with little memory overhead.

Before containers became ubiquitous I was pretty sure unikernels were going to take over (I really liked the Blue Gene architectures) and I still think they have a lot of promise. Of course, I thought the WWW was a fad and that gopher was going to be the future, too, so…. anyway, there’s a new unikernel ecosystem out now, nanos.

An overview of HTTP security headers.

Getting started with a bullet journal.

RCT Newsletter