#108 - 5 Feb 2022

Hi, everyone:

Several readers responded about their version of the challenges I described last week - most teams are still working remote; some had reduced-occupancy office space available for those who wanted to come in, although it sounds like during the Omicron wave that was sparsely taken up where allowed at all. And those who mentioned it agreed uniformly on how hard it was to hire these days.

I think in some places we need to come up with our own solutions to these problems that meet the challenges our teams face (small, relatively slowly-growing teams with wide scope and demands that can change month-to-month and weird funding), and in other places there are opportunities to learn from other kinds of groups. This newsletter is a great medium for sharing resources, but it’s a bit of a slow way to have a back-and-forth on topics of timely interest! I’ve mentioned it before and included it on lists of resources, but the Rands Leadership Slack is a great community for technical managers and leaders, largely from tech and startups; if that’s something that interests you, dear reader, let me know you’ve joined or are considering joining and we can create a research computing and data channel for discussion of our particular needs.

In the meantime, news on my front. I’ve been thinking about people leaving jobs a lot lately, because we just found out one of my team members is leaving, leaving a big hole in the team - but also I’ll be changing jobs, too. (More about this in a bit). So we’ve been going through two off-boarding processes at the same time, one (mine) much lengthier than the other, involving two somewhat different groups of people, and requiring knowledge transfer at different levels of granularity, which is an interesting perspective into the situation.

The one thing that hits home most is that I really thought I had absorbed the lessons of an article I’ve referenced a couple times in the roundup, Always Be Quitting. That article described the advantages for stakeholder clarity, for your own perspective, and for growth opportunities for your team members of continually documenting what you do and the state of things, and of teaching team members who want to grow in particular directions different parts of your job. But when I started this offboarding process at the end of last year, I discovered how superficial my efforts in the direction really had been.

Writing a many-page document describing my activities and the state of tasks so that responsibilities could be handed off to various people was an incredibly clarifying experience. It was also one in which I realized how much had gone undocumented previously. Had I written that document a year earlier, and seen everything in one place, I would have focused my energies in the past year in subtly but noticeably different ways. Writing bullet points of priorities and needs and todos scattered in various places didn’t have nearly the same clarifying impact on me, much less on others. And I would have done a much better job of delegating the right tasks to the right people.

I’m also surprised as I work on offboarding myself how much information I thought had conveyed many many times in many many channels and yet hadn’t been really taken. I’ve seen this come up in many handoff meetings. Luckily most of these things were already documented somewhere, but just because it was written down doesn’t mean people committed it to memory. After all, it’s Jonathan’s job to know that stuff.

So in next jobs, I will take the “making myself obsolete” part of the job much more seriously; this has been very eye opening. But I’m really pleased with how the process is going, and how the team members are stepping up. Honestly, after five years the team and the effort are ready for and will benefit from a change; while there’ll be some disruption in the short term, the emerging new leadership team will be stronger than just having me at the helm.

I’ll talk more about my new job in the next issue - it’s a bit of a return to roots, and to being an IC again. But for now, the roundup!

Managing Teams

The Manager’s Stack - Jamie Lawrence

What are your favourite life-of-a-manager or -lead tools? Not team tools - Slack, Zoom, GitHub, etc - but the things you personally use to make yourself productive as a lead or manager?

I can’t imagine work life anymore without I and others using some kind of book-a-meeting-with-me tool like Calendly (even though that particular tool is now getting some pushback). I’ll never do the “actually, Tuesday afternoon’s not good for me” dance five times a week again. For group meetings, setting up a proper doodle account with my calendar linked means whenever I respond to a doodle, it automatically fills in my available times.

I’ve got my mutant bullet journal approach just the way I like it for daily note taking (in an Artist’s Loft journal, like a Leuchtturm but much cheaper, with 120gsm paper being perfectly suitable for my preferred pens). Now that we’re purely remote, I no longer rely on online note taking for things like 1-on-1s, and have a paper template I like. Shared documents like quarterly goal setting with each team member are still done in Google docs running documents, which would doubtless horrify both HR and IT, so don’t tell them. For writing my own stuff, I like Dropbox Paper for live documents, mainly because it exports to Markdown, and for taking linked notes while learning stuff I’ve been using Foam. Sadly, I still haven’t found a personal task manager I like. Most recently I had been playing with Trello for that purpose, but, well, the search continues.

In this article Lawrence talks about his favourite tools as a manager. Not surprisingly, calendar tools play a big role. He prefers SavvyCal to Calendly, uses Reclaim to sync his personal and home calendars, and Fantastical on the Mac for viewing/editing his calendar. One tool he calls out which is really interesting is Krisp, intended for removing the user’s background noise from video calls. However, it also keeps track of what fraction of the meeting he’s talking, which is something we have to watch out for as managers and leads.

What are your favourite tool discoveries of the pandemic? What could you just not live without as a manager? Anything you made yourself, whether a document template or something else? Hit reply or email me at jonathan@researchcomputingteams.org and next issue I’ll include a summary.

Not everyone can become a great leader… - Matt Schellhas

This is a good first-person account of how pernicious the whole “great leaders are born not made” nonsense is. Schellhas talks about being held back from becoming a manager for years because he didn’t look or seem the part. He did, in the end, become a (successful) manager and leader, and recognizes how at least he got the chance eventually, many never do.

On top of discouraging people who don’t look and act like they came out of central casting for managers from gaining more responsibility and power, this really toxic “good managers have certain personality traits” belief also discourages people from learning how to be better managers, better leaders. How can you learn to be better at something that is innate, right?

Anywya, Schellhas’ article is worth keeping in your back pocket to send to people. We have to stomp out the nonsense “manager == personality type” idea. Being a good manager and leader is no different than any other profession - you have to develop certain skills and behaviours, and continue your professional development over the years to refine your own abilities and grow them in others.

14 strategies to shorten lengthy meetings - Hanna Ross, Fellow.app

It’s been a little while since we’ve had a good meeting article in the roundup. This one covers some ground we’ve seen before, but meeting management is so fundamental that it’s always worth reviewing.

Ross’s points, slightly trimmed for length:

Have a clear purpose and agenda for the meeting in advance, and send background materials - can’t emphasize this enough. Agendas and materials are important, but without an underlying purpose, you have no way to assess whether the meeting was good or not. The agenda is in service of the purpose, and the materials in service of the agenda.
Limit number of attendees - including yourself, say no unless there’s a compelling reason for you to be there
Assign meeting roles - facilitator, meeting taker, chair
Assign lengths of time to agenda slots, with a parking lot for topics that come up.
Say no to going off topic.
In recurring meetings, shuffle the order of topics. (I haven’t seen this before - that’s a good idea!)
Start at an usual time - we’ve been regularly having meetings start at 5 minutes past the hour and half-hour, which is great, but it’s especially handy if you..
Use an unusual duration - are you sure that 55 minute meeting can’t be done in 45 minutes? Or 40?
Have meeting ground rules for behaviour
Do Q&A’s asynchronously.

Managing Your Own Career

Know how your org works (or how to become a more effective engineer) - Cindy Sridharan
Managing Up: The Do’s And Don’ts And Why It’s Important For Success - Sam Milbrath

As we grow more senior, we want our work to have more impact. To do that, we need to better understand the context of our work - within the institution, across our specific research community, and our funders. Doing the best possible work on one technology or technique as a community is moving to another is a recipe for wasted effort. Great ambitions to expand a project without understanding what funders are looking for and your institutions’ priorities is a recipe for frustration.

Sridhan gives some specific examples for understanding what is going on in your institution. The context of her article is the private sector, but much of it carries over. In particular, to be successful in the long term, it’s important to learn

how to build lasting relationships with other people on your team or organization that will ultimately dictate the success of a project
how to effectively pitch projects or improvements and see these through to completion
how to manage conflicting priorities or expectations
how to best deal with setbacks
how to identify and drive quick wins
how to use this knowledge to judiciously pick battles

A specific case of knowing how your organization works, and how your work fits inside that context, is “managing up”. Like “networking”, “managing up” has acquired a gross, smarmy reputation. But “networking” is just developing professional relationships with people in your community. And “managing up” is just growing a productive professional relationship with your boss.

The usual challenges to growing that productive relationship is that:

your boss may have a very different work and communication style than you do
may have a lot else on their plate than your work, and
that other work may give context to yours that you don’t know about.

These aren’t that different from challenges working with your team members. They see things from their hands-on point of view that you don’t. They have communications styles different than your own. They face challenges you’re unaware of.

Like both, the way you overcome these challenges are regular conversations, learning about opportunities and needs, and chipping in where possible to help both the people and overall mission succeed.

Milbraith’s article gives some do’s and don’ts for managing your single most consequential professional relationship.

Career Advice Nobody Gave Me: Never Ignore a Recruiter - Alex Chesser

If you’re interested in working in the private sector at all, this is a worthwhile read. No, you shouldn’t hop on a call with everyone who sends you span on LinkedIn. But as Chesser says, there are real jobs out there with good recruiters sourcing for them. And you can’t always tell who is who from the messages.

Chesser provides some scripts for interacting with recruiters. The scripts aim to weed out low-value “just reaching out” messages while building a conversation with the recruiters that have career-changing possibilities. The scripts are well worth stealing.

In research and especially academia we really limit ourselves by not actively recruiting for new team members. Yes, recruiters are expensive - two months salary for a successful hire isn’t uncommon. But not being willing to pay that is more evidence that we aren’t willing to invest in excellent staff. (See also: not wanting to pay for good tooling). Almost everywhere in research you’ll see flowery paragraphs saying “people are our most important resource.” But too many teams are willing to spend months of people’s time creating and evaluating RFPs for equipment, while treating hiring as an afterthought worthy of little effort and less money.`

Research Software Development

Fixing Performance Regressions Before they Happen - Angus Croll

Performance matters for research software, although how much it matters depends on the application. I’ve seen groups with performance tests as part of their CI/CD pipeline – the Chapel language nightly performance tests come to mind — but performance is tricky. There’s always some noise in runtime numbers. Other system processes might be taking up CPU time or memory bandwidth. The state of various caches will vary.

Here’s how Netflix keeps an eye on performance in their development pipeline. They run performance (time and memory) tests of some sort on every commits. They run each test three times to get a median. Then they flag a significant change if the results differ by more than four standard deviations from the recent mean-of-medians.

We probably don’t have the same development velocity as Netflix, so some of the level of automation they have isn’t necessary. Having regular time and memory testing (integration and some unit tests) of our code with a clear metric for when things change enough to merit investigation is something that’s attainable for most of our groups, though, if those measures are important to us.

Using dynamic thresholds to identify changes by looking for changes away from the mean by more than several recent standard deviations. This requires a lot of measurements!

How to fix your scientific coding errors - Jeffrey M. Perkel, Nature

There won’t be much new here for you, my colleague, but it’s heartening to me to see how mainstream once niche discussions of challenges of scientific software development are becoming. Here, in a feature in Nature(!) there’s discussion of code review, version control, testing, and automation and repeatability. There’s even a couple of favourably described stories of researchers issuing errata for publications after finding a bug, in a pretty transparent attempt to normalize such things (which is great! In Nature!!). Roundup veterans like Lorena Barba #39 and friend of the newsletter C Titus Brown (#98, #90) make an appearance.

One of the interviewed PIs, Dr Julia Strand in Psychology at Carleton College, was so affected by an potentially-catastrophic but in one piece of her software that she developed an approach to handle errors in research more generally. Her resource “Error Tight,” aims to reduce the frequency and severity of outcome of errors across the research process. This is starting to look more like safety-by-design approaches in safety-critical areas like aviation. That’s a fascinating development, and the resource is well worth a look. Also, it introduced me to PsyArXiv, preprints for the psychological sciences, which I’m ashamed to say I didn’t know existed.

How Software in the Life Sciences Actually Works (And Doesn’t Work) - Elliot Hershberg, New Science

This too covers much familiar ground to readers of the newsletter. The inadequacy of existing research funding mechanisms aimed at supporting research outputs like experiments and papers for research inputs like production research software are pretty well known.

But this is a well written article by a grad student, and it has a bit of a different perspective than most articles on the topic. For one, it mentions with a fair amount of hopefulness new institutions and funding organizations like Arcadia, Arc, and related. Second, it’s positive about a well-established but often ignored funding mechanism for research inputs — selling a product for money. Faving simulation and analysis tools be open-source is important for transparency of science. But still, there are mechanisms like hosted/managed SaaS offerings or charging for feature development which are under-explored in our ecosystem.

Research Data Management and Analysis

Intel oneAPI’s Unified Programming Model for Python Machine Learning - Janakiram MSV, The New Stack

It’s interesting and maybe surprising to those of us who have been here a while how big the Python data analysis ecosystem has become, and how robust it is to big changes. Various accelerated computing frameworks, such as NVIDA and now Intel with OneAPI are meeting data scientists where they are, accelerating Python frameworks like pandas or scikit-learn or even bumpy with their lower level tools rather than trying to move them to something else.

And, in something of a tribute to those python frameworks, this approach seems to be largely successful. The APIs defined by those libraries seem to be perfectly suitable to drastic re-implementation with quite different technologies.

This article has a description of and links to Intel’s oneAPI implementation of acceleration for scikit-learn and pandas. oneAPI is an industry-wide (but, it must be said, Intel-led) effort to have a common and vendor-independent programming framework for a wide range of CPUs and accelerators. The author promises updates with tutorials for installing the toolkits and training models.

Research Computing Systems

NHR @ Göttingen Security Workshop Talks - From 16 Dec 2021

Last December, GWDG held an HPC Security workshop; most of the slides are here. There are three interesting looking talks:

Security in HLRN - Concept and Experiences, Tim Ehlers, which covers the basic approaches used when designing spinning up Hamburg’s HLRN system, and how they played out
”Supercomputers offline across Europe”: Forensic investigation of the Taurus HPC cluster, Pascal Brückner, which gives a pretty detailed overview of what happened on one system during the HPC cluster security catastrophe of spring/summer 2020, and the analysis they performed
Security in HPC with Containers, Holger Gantikow, where Gantikow from Atos goes over the pros and cons of containers for HPC, particularly around a software bill of materials

Sarus achieves container portability with near-native performance - Raluca Hodoroaba, CSCS
Saurus - the Saurus team

If you’re not tired of playing with CharlieCloud and Shifter and Podman and Singularity/Apptainer and others, there’s another OCI-compliant container image and execution system which aims to be specifically for HPC, by which it means:

Plays well well POSIX-type security approaches (e.g. based on local userids and permissions as opposed to cloud-native key-based approaches)
Provides hooks for MPI and SSH and Slurm
Has higher-level image definitions which makes it easier for an image to take advantage of different architectures (I don’t understand this part, but it seems to be the focus of the CSCS article)

Has anyone looked at this? I’m pretty sure I don’t understand the distinguishing features here. There’s a paper after which I remain confused.

Cray’s Slingshot Interconnect is at the Heart of HPE’s HPC and AI Ambitions - Timothy Prickett Morgan

A good overview of the high performance networking competitive environment in which Cray/HPE’s slingshot interconnect finds itself, and how central it is for HPE’s current plans. There are some hints as to the different forces in play which will shape upcoming generations.

Emerging Technologies and Practices

Real-time machine learning: challenges and solutions - Chip Huyen

An increasing use-case for computing, data, and software is online or real-time machine learning - both training and inference. Data and results come in and a model is updated, or a measurement comes in and a prediction or classification is made. This is a challenge for the kinds of systems or analysis tools or data stores that we’re used to dealing with. We’re pretty capable of putting together complete solutions that handle batch modes of operation, but anything “online” or “near-real-time” puts very different requirements on all aspects of software, systems, and stores.

This long read by Huyen talks about the challenges of real-time learning - first with online prediction, then with continually-updated training. She then walks through a staged approach to get from our more usual batch approaches to something like real-time prediction or training.

A lot of us are going to have to support use-cases like this in the near future, so even if this isn’t a challenge you have right now, it’s worth reading to understand the needs and what’s involved.

Random

Implementations of wordle in google sheets, Word ’97, over ssh, and for the best “personal digital assistant” of the 1990s, the Palm Pilot.

Connect to a database using CSS.

Emacs org-mode as a SQL playground.

The case against ZFS on Linux.

Live incremental backups of Qemu VMs.

Performance of large JSON queries in Postgres - tl;dr HSTORE is good if you just need string key-value pairs, otherwise JSONB is the winner, and either way you should use compression because once rows get bigger than 2kB everything (not just JSON) gets way slower. PS I like the experimental design baseline choice - prefixes of a BYTEA column.

Open-source is good, but not because it magically conveys improved security.

I’ve been hearing about gitpod as a slightly different approach to some of the same usecases as GitHub codespaces, and with a more predictable (not necssarily better, just more certainty upfront) billing model - anyone have any experience?

That’s it…

And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.

Have a great weekend, and good luck in the coming week with your research computing team,

Jonathan

Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.

So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.

This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.

RCT Newsletter

#108 - 5 Feb 2022

Offboarding myself; Manager tools; Shorter, better, meetings; Managing up and out; Recruiters; Life science software; HPC security; Saurus; Getting to real-time ML

Managing Teams

Managing Your Own Career

Research Software Development

Research Data Management and Analysis

Research Computing Systems

Emerging Technologies and Practices

Random

That’s it…

#108 - 5 Feb 2022

Offboarding myself; Manager tools; Shorter, better, meetings; Managing up and out; Recruiters; Life science software; HPC security; Saurus; Getting to real-time ML

Managing Teams

Managing Your Own Career

Research Software Development

Research Data Management and Analysis

Research Computing Systems

Emerging Technologies and Practices

Random

That’s it…

About This Newsletter