#156 - 18 Feb 2023

Hi, everyone!

A couple of reader replies from the last issue - here’s long-time reader Scott Delinger, CEO of Canadian regional academic HPC consortium Prairies DRI (digital research infrastructure), responding to an article about remote work:

Our situation in Canadian ARC [Advanced Research Computing] is slightly unusual, in that we’ve ALWAYS had remote work, even when staff were sitting in the same room prior to COVID. So the main difference is the effort required around group social aspects, not work tasks.

This is exactly right - Canadian ARC is an extreme example, but in our line of work, distributed collaborations are pretty common. That experience gives us a huge advantage when it comes to remote work more generally; we can make use of that by applying what we’ve learned there (around collaborating in a more document-based way, asynchronous discussions, etc) more consistently in other contexts. Some things are still hard remotely - social cohesion, mentoring juniors - but we can build on our existing experience. A lot of other communities didn’t benefit from having had that starting point.

In addition, in response to what I wrote last issue (#155) about strategic plans, another long time reader Adam DeConinck writes in to say:

Regarding strategy docs, one of my favorite ways to approach this is from Will Larson’s StaffEng site: https://staffeng.com/guides/engineering-strategy. This effectively takes the approach that strategy should be inferred, bottom-up, from the actual design work that your team is doing. Rather than designed “top down” as a new set of ideas.

I’m a huge fan of this technique! Larson’s advice is basically to take the last five design documents (or project intake decisions or services delivered or…) and synthesize into a strategy, essentially inferring what the actual implied strategy currently is. The vision then extrapolates what continuously applying that strategy would lead to.

I have seen way too many times teams write very aspirational strategic plans or other kinds of strategy that were far too disconnected from what actually is happening on the ground. No matter how compelling the future vision is, facts on the ground are always going to win out over a piece of paper. Any real change has to come from knowing both where we currently are as well as where we want to go to, and having a clear plan for how to move from one state to the other.

It’s vital to understand where we’re starting from before beginning any process of setting out on a new strategy. If you’re unsure of what the current strategy is - maybe because change is needed, or maybe because things seem to be working well and you want to codify current practice so that things stay on track - this approach of taking what is actually being done, and unearthing the implicit strategy that seems to be playing out - is an incredibly useful step. Warning: the strategy you uncover may be very different from what you think it is.

Finally, I’ll introduce experimental sibling newsletter Manager, Ph.D..

As you know, people with our backgrounds have certain strengths and certain gaps when it comes to managing a team. It makes sense to support our peer group with these more general management challenges so we can have as much impact as possible!

And that includes groups well outside of the RCT wheelhouse. So my intention for now is that material not specifically about our RCD teams or other expert teams within academic research will appear there more often than here. I’ve “launched” MPHD with a backlog of just the general management topics from previous RCT issues; that means that a few issues are missing entirely, which is why the issue numbers are different.

I’ll keep posting general “person from the world of research managing teams in an organization” material preferentially in MPHD.

Material that will show up here in RCT more consistently will cover topics like:

Managing, and technical or project/product leadership of software, systems, and data (science, stewardship, engineering) teams
Strategic planning and prioritization in an academic research context
Running an expert services organization.

I’ve heard from several people know that they didn’t mind or even preferred everything in one place. But I need to try something different for my own sake - I was trying to keep the management stuff as general as possible to help as many people as possible while also having the RCD focus. I was driving myself bonkers trying to do both at once in one email.

So we’ll keep things increasingly separate for the next couple of months and reassess.

I can’t reassess, of course, without your feedback! So as always, please let me know what you think. And not just what you think of how the split of topics is going. For instance, for those who do decide to keep an eye on MPHD, let me know the pros and cons of using substack which is where I’ve got that stashed for now.

And with all that, on to the roundup!

Managing Teams

Research Computing Teams Interviews: UCL ARC - James Hetherington, Jonathan Cooper, Donna Swann, and Chris Langridge

When I spoke with Ian Cosden (#154) about the RSE program at Princeton in general, one of the topics we covered was the career ladder he’s developed for RSEs in Princeton, the preconditions that had to be in place before then, what he had to do to make it happen, and what’s now possible.

In this interview with James Hetherington, Jonathan Cooper, Donna Swann, and Chris Langridge from UCL ARC talk, we focus specifically on their work with career ladders, and about the work they’ve put in over the past several years to improve hiring, retention, professional advancement, and job role clarity on their team. The ARC team, with its larger remit, had to consider careers for software, systems, data stewards, data scientists, and research management staff in parallel.

Here too, many preconditions had to be in place first, and the work had to be done methodically. UCL ARC took small steps which at each point solved some problem for them, and laid the foundation for the next:

Established credibility with Finance and HR by consistently having the money to hire when they said they did, and hiring consistently over time - that made it easier to ask for flexibility and investigate options;
Built relationships with those partners through consistent conversations - making further collaboration and streamlining processes easier;
Established “blended recruitments” where they could hire at various levels - allowing for more flexibility in hiring and “promotions” of a sort
Defined clear career ladders along which people could advance - helping with retention, clarity, and establishing official mentoring relationships
Having “always-open” job postings which make the hiring process easier for ARC and for the candidates.

While there are institutional differences, it’s the similarities that are more striking. Career ladders for these new roles can’t be started right away, are a lot of work, and the levelling requirements will depend a lot on local needs and context. But once done, they’re a powerful instrument for change not only for the working conditions in the team that originates them, but across the institution.

Structural Problems Don’t Yield to Local Solutions - Jonathan Dursi, Manager, Ph.D.

This is the sort of article which I think will appear more in Manager, Ph.D. than here - general “person from the research world managing a team in an organization” advice.

Anyway, I focus on key tools we need, and on what we can change. That means focusing more on our teams and our relationships with peers. Opening up those lines of communication to productive conversations is always the first step. But some problems are bigger than what can be fixed that way. One-on-one-ing harder won’t change them.

You can tell if there’s a bigger issue when

The problem doesn’t change
The problem keeps reappearing in the same place
The problem keeps appearing in other places

The key things to remember when you’ve identified such a bigger issue are:

It’s not your fault - this problem wasn’t caused by you, you might not be able to fix it
Not fixing it is always an option
If it does get fixed, it’ll be head-on (but will require other people in the org to work together)

Doing More With Less (Webinar) - Clare Lew

With things changing dramatically in tech, we’re going to see a lot more articles from tech management on ruthless prioritization - which suits me just fine.

Our teams of experts are pretty much always small, and often pretty chronically under resourced compared to what we’d like to accomplish. And yet a lot of the managers I speak with still need help with not trying to do everything.

I’ll talk about this more next week, but if we’re going to have the maximum impact with finite resources, we need to be almost monomaniacally focussed on doing the work that makes the biggest difference, and getting really good at it. Doing a little bit of this and a little bit of that won’t cut it. It fails to maximize impact in two different ways - it doesn’t focus the effort where it can do the most good, and it means our team can’t improve and grow as quickly as they can when we’re focussed on one area.

Lew’s talk on this - not just focussing but on scaling that focus - is great. About half the talk is about dealing with the possible morale fallout from a sudden big change. That’s not something that’s super relevant to our context. But the rest is dead on:

Question “More” - do more of the most important things
Framework & Feedback - make sure people know how to think about priorities so they can make the right prioritization decisions autonomously, and give feedback on how they’re doing
Scale the mindset - make sure managers know what is expected of managers, are doing 1:1s, giving and receiving feedback, are communicating the priorities, and are coaching

Technical and Project Leadership

Writing an engineering strategy - Will Larson

Larson, whose advice above was so useful for understanding where strategy is now, talks about creating strategy to address new challenges. He’s influenced strongly by Rumelt’s Good Strategy/Bad Strategy book, which has made several appearances here, and writes his strategies in three sections:

Diagnosis - what’s the nature of the challenge this strategy is aiming to solve
Guiding policies that we’ll use consistently over the life of this strategy to tackle the challenge - the policy should be able to be applied when making tricky decisions in the months ahead
Coherent actions - our first next steps

There’s lots of good material in here, and if you’re interested it’s a good read. I want to emphasize one point for our context.

A disproportionate amount of what’s written out there on strategy is for corporate leaders, who can take their companies in entirely new directions if they like. That’s not an option available to us - and indeed not to most managers and leaders. We’re part of larger institutions, and need to work together. That’s true in the private sector, too. Here Larson emphasizes that the engineering strategy has to support the business strategy. A brilliant engineering strategy which doesn’t help the business isn’t, in fact, a brilliant engineering strategy.

It’s the same for us. Our teams strategies have to support the institution’s strategies. They’re working at cross-purposes otherwise. At best there’s wasted effort.

Institutions differ, but none of us have VPRs whose strategy is “support all researchers the same”, or CIOs whose strategy is “a bit of this and that - whatever people ask for, really”. At any given moment, those two organizations are marshalling efforts to help the institution grow and thrive in ways that meet the possibilities and the challenges the organization sees.

The more we can align our team’s strategies with those of the organization we’re part of, the more likely we are to be successful, and the more impact our successes will have. We’ll come back to this more next week.

Asking Your Project Team to KISS (Keep/Improve/Start/Stop) - Mark Warner

As you know, just as one-on-ones are key for managing or leading individuals, retrospectives are absolutely key to managing and leading effective teams (#137). Here Warner talks about retrospectives particularly in project work. Constructing lessons-learned documents at project closeouts is part of this, especially if there are other teams that could benefit from what your team has learned, but Warner points out this should be a continuous, ongoing process.

Warner uses the K/I/S/S formulation for brainstorming these sessions, but there’s many such frameworks for getting input, and it can be beneficial to cycle between them to keep the meetings fresh (#61).

Warner emphasizes the importance of documenting what comes up and presenting results, especially on changes made in response to people’s suggestions. This helps make team members feel heard and will encourage more contributions, and so faster improvement in the team’s work.

Warner also lists two other caveats:

Expand the list of participants - for project based work, having stakeholders or clients contribute their input is really important, whether it’s at closeout or separately
Make sure you hear from everyone, not just the loud voices - give others ways to contribute their suggestions

Managing Your Own Career

Building a Great Relationship With Your Boss - Paulo André

We’re often experts, leading a team of experts, managed by someone who isn’t an expert in our area. The good news is that we’re often given a lot of autonomy to run our team’s work as we see fit. But it also means that we can become quite detached from our bosses’ goals and needs, and by extension that of the organization.

It is really important, if we’re going to get the support we need from the larger organization and in particular our boss, to line up our team’s work, at least in part, to support our bosses goals. That means, amongst other things, knowing that they are!

André gives some advice about developing a great relationship with your boss, even if right now you don’t talk to them much. He gives four pretty good questions that we really should know the answers to (and the answers will change over time):

What keeps my manager up at night?
What is success for them now and in the long run?
What pressures are they subject to? Where do they come from?
What do they expect from me? What do they hope to have from me?

How to write a great extended leave document - Ben Balter

In #132 we talked about the usefulness of vacation as a way of practicing delegation if you’re not doing it already. Here Balter shares his template “going on leave” document, a short document that you can keep maintained and then share if you’ll be going away -

Dates
Contact preferences
Points of conteact
Regular meetings
Rolodex
Stuff being worked on
Other important

What else would you keep on this document? Are there other things that could be usefully documented here? Let me know - just hit reply or email me at jonathan@researchcomputingteams.org; I think an RCT template for this could be really useful.

Research Software Development

How software engineering behavioral interviews are evaluated at Meta (from an ex-Meta manager) - Lior Neu-ner

Behavioural interviews are underrated in our line of work, where we focus on expertise. But behavioural interview questions are great ways to have people demonstrate how they used that expertise in real situations.

The key for these being useful, however, is to (a) have good and relevant questions that would give some signal as to how candidates might succeed or struggle in the actual job, and (b) to have a good rubric ahead of time, with agreement about what would and wouldn’t be a good answer.

Neu-ner describes how at Meta they ask behaviourial questions to assess motivation, ability to be proactive, ability to take ownership in an ambiguous situtation, perserverence, conflict resolution, empathy, growth, and communication. And crucially they have expectations for junior, senior, and staff-level answers to thees questions.

Not all of this will apply directly to our teams, but I like the breadth of questions here and levelling. One thing I’d add is that you’ll get better answers if you let people know why you’re asking and what you’re looking for - e.g. not just “Tell me about a time when you wanted to change something that was outside of your regular scope of work,” but “Our work here means that people get pulled into a wide variety of work, and so it’s important that our team members are comfortable tackling new challenges. So, tell me about a time when you wanted to change something that was outside of your regular scope of work.”

I’ll also add that it can be very useful to share the questions ahead of time - you’ll get better and more relevant answers. The real value in the question is not the immediate answer, but in the followup questions and back and forth as you dig into how they did that thing, and why they decided to do that over something else. If people know the questions, you can go faster straight into the meat of the interview, those followups.

Research Data Management and Analysis

Ah this is interesting - a Jupyterlab desktop application, with its own jupyterlab setup for running locally, and can open sessions to remote JupyterLab servers. Amongst other things this means easier start up for new users, and (I think?) you can set all your UI settings the way you like them in the desktop app for once and all and have the remote sessions honour that…

Data Migration Tips - Josh Tolley

Migrating data sets that people rely on is, not to put too fine a point on it, kind of scary. A lot can go wrong.

Tolley gives his hard-won advice:

Be careful and thoughtful about communication, both in terms of expectations and in terms of technical terms
Have a plan that is a sequence of steps
- Each step should leave you with a useable system!
- Each step should be roll-back-able
Document the decision making process
Maintain a history for the migration
Consider have a staging data store

A reader chimes in with three other points for working with stakeholders

Provide a full copy of all the data as soon as possible, even if it’s super messy
Provide updated versions of the data for testing so people can catch breaking changes
Scripts and data mappings should be visible to everyone - people can and will spot issues early

Research Computing Systems

Recruiting developers into Site Reliability Engineering (SRE) - Ash Patel, SREpath

One of the roles I consistently see our teams struggle to hire for is “DevOps” type jobs, where the individual has to have a foot in both software development and operations.

We don’t typically need Site Reliability Engineers, which are more about keeping systems operating at high levels of reliability. But SREs have a similar mix of capabilities, combining development and infrastructure operations, and I think we can learn useful tips from Patel’s article on recruiting developers into these cross-cutting jobs.

In Patel’s estimation, it’s easier to (successfully) pull developers into these roles than sysadmins - that’s been my experience, as well. And besides the fact of the increased job prospects (literally everyone is trying to hire for these kinds of roles), they are actually kind of fun - there’s more ambiguity, more complexity, in these jobs where the entire system from infrastructure to software to external network connections are in scope.

Patel encourages developers who might be interested to:

Learn a systems-oriented language (Go, Rust)
Join DevOps/SRE communities
Follow cloud-native projects
Put together and break sandboxed systems (maybe like sad servers below)
Build sysadmin skills (there’s a lot of courses out there now)

On our side, Patel counsels us to:

Make sure they have mentors
Give them clarity on what success looks like, and continual feedback
Start them with very narrow scope
Slowly expand that scope as they grow

This could be great for training or maybe even getting ideas for interviewing - Sad Servers is “Like LeetCode for Linux”, a set of 18 linux sysadmin problems where a server is spun up for you and you have to figure out the problem.

AGBT2023, one of the year’s big genome sequencing conferences, just ended. It’s a little early to find retrospective, but a lot of the commentary (like this twitter thread) is pointing out that there’s now a few plausible candidates for technologies that will sequence a human genome for $200 or $100 in sequencing costs for consumables. That number had been hovering around $1000 for a long time.

So for those of us with large genomics users, there’s a decent chance that in the coming few years, some of them will be collecting 5-10x as much data for some of their projects.

Random

“We find that open source [C] code containing swearwords exhibit significantly better code quality than those not containing swearwords under several statistical tests.“

A nice 80-page introduction to deep learning models for people with a computational physics background.

A walk through of a 1950s analogue computer with 2,781 parts for determining the airspeed and altitude of fighter planes - The Bendix Central Air Data Computer (CADC).

A lot of groups are beginning to have to think about handling sensitive data for the first time. This short UN guide for privacy enhancing technologies for sensitive data analysis (like differential privacy, homomorphic encryption, secure multiparty computation, and distributed learning) is intended for decision makers at national statistical agencies, but it’s a pretty good crash course into what the differences between the tools are.

Relatedly, with fully homomorphic encryption, mathematical expressions need to be translated into operations which perform the corresponding calculation within the cryptosystem - here’s an open source FHE compiler for C++ from Google.

I find this 10 min AWS HPC video talking about life sciences customers adoption of the cloud really interesting. They find that even in this pretty specialized sub-area, there’s extremely heterogenous workloads having to run on pretty homogenous machines and the waste of resources that implies. They also talk about the usefulness of a mix of reserved, on-demand, and spot instances.

That’s it…

And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.

Have a great weekend, and good luck in the coming week with your research computing team,

Jonathan

Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.

So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.

This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.

Jobs Leading Research Computing Teams

This week’s new-listing highlights are below in the email edition; the full listing of 165 jobs is, as ever, available on the job board.