#41 - Link Roundup, 11 Sept 2020

Hi, all!

Just a short introduction this week - there’s a full set of links to round up this week, with a lot on communicating with teams, watching out for burnout in ourselves, academic communities, and

The newsletter is nine months old now. As the list of resources covered here grows, and we see where the gaps are for topics research computing team members need to be discussed, my plan is over the coming weeks to trim down the number of links covered each week and to focus on a bit of writing on topics that aren’t widely covered. That would mean things like grant writing for research computing, research community building and the like. That will probably also include interviews with research computing leaders.

What do you think; what would you like to see more of, and what do you think the newsletter could do with less of? Hit reply and let me know.

For now, on to the link roundup -

Managing Teams

Never Skip Retros - Tim Casasola, The Overlap

In his new newsletter, Casasola argues that one of the most fundamental team meetings you can have are regular restrospectives, because:

They disrupt the habit of anticipating the future,
They are low hanging fruit, and
They put teams on the path to continuously improve.

He goes on to suggest tools like Parabol and Fun Retrospetives as tools to help with the retrospective process.

This isn’t exclusively a software development (or even computing) practice; it’s widespread in project management generally, and I think any time there’s a natural place to take stock and look back, doing some kind of a “what went well, what should we take a look at for next time” meeting is well worth the time for the potential improvements.

Informal Communication in an all-remote environment - GitLab

GitLab has long been an all-distributed company, and this section of their handbook on running distributed teams is dedicated to setting up channels for informal communications in those environments.

A couple of the suggestions in here are very simple because they involve taking advantage of meetings you likely already have. Encouraging informal conversations in retrospectives, for instance, where a bit of brainstorming and riffing off of things is just part of the meeting; or starting meetings early to give people who want to join early a chance to chat.

Other suggestions include taking advantage of communities outside of work to connect people - sending people to conferences (not super relevant right now) or having team members bring back ideas from relevant events, virtual or otherwise, that they attend.

Other examples are a wide library of social gatherings you may have read about elsewhere - talent shows, coffee chats, co-working calls, trivia nights, pizza calls.

Incident updates, interruptions and the 30 minute window - Dean Wilson

One management skill I wrestle with is the tension between giving my team members, who I trust, the freedom to solve problems as they see best (this is the easy part for me) while staying informed enough to make related decisions and make sure no one is falling down any rabbit holes (this is the tougher part). Wilson’s article is just a nice story about a previous boss who would consistently, gently, but firmly interject “just enough” during an incident to make sure they knew what was going on so they could communicate upstream, and to keep people on track, while letting the team do their thing.

How to Call Out Racial Injustice at Work - James R. Detert and Laura Morgan Roberts, HBR

At the beginning of the summer there were a flurry of articles on addressing racial or other systemic injustices in the workplace. Unfortunately those have died down a little bit. This HBR article discusses how to call out racial injustice at work - it could just as easily be used to address issues of gender inequality, or dealing with any systemic issues.

The steps Detert and Roberts suggest are:

Use allies and speak as a collective.
Channel your emotions (but don’t suppress them!)
Anticipate others’ negative reactions. (“If your request evokes a furrowed brow or a crossing of arms across the chest, start asking questions: `These seem like appropriate next steps to me, but perhaps they feel problematic to you. Can you help me understand what you’re thinking, and why these may not seem right to you?’”)
Frame what you say so that it’s compelling to your counterpart. (“We are evolving together” rather than “I am revolting against you.”)

And finally, and maybe most crucially,

Follow up. A single conversation isn’t going to be enough.

As managers in research computing, most of us are white, and many of us are white men, and so don’t really have to deal with steps one and two when we see issues - we can speak up when we see issues and our voices will be heard and taken seriously without having to have safety in numbers or modulating our emotions. Indeed, we have an obligation to do so. Even so, where applicable it would be best to connect with those most directly affected and make sure we’re advocating for the right things, and lending our voices to theirs.

As a bonus, this framework is a very useful one for raising any difficult topic with higher-ups in an organization.

Managing Your Own Career

A 4-Step Process For Avoiding Burnout - Madeleine Evans, The Path Forward

Last roundup there we talked about the emotional resilience report which covered a lot of really good background on burnout. This is a much more tactical article outlining specific steps:

Do a reality check - how often do you find yourself agreeing with questions like “I feel burned out from work”, “I have become more callous towards people lately”, or disagree with statements like “I have accomplished many worthwhile things lately”.
Identify your biggest risks - things that cause burnout at work are high demands, unfairness, lack of control, and things that help fight burnout are enough time to rest/recharge, support, good match with your values and the work you’re doing, and reasonable recognition/reward for your effort. Which of those things are the biggest issues?
Have templates and strong habits for the things which replenish you in the areas above
Plan ahead and review your progress each week on doing concrete things to help avoid burnout.

I wouldn’t say that last week’s article is a prerequisite for this one but I think it’s very helpful for establishing the insidiousness of creeping burnout and gives context to the steps above.

Product Management and Working with Research Communities

Roadwork ahead: Evaluating the needs of FOSS communities working on digital infrastructure in the public interest - Elisa Lindinger, Julia Kloiber, Katherine Waters, Katharina Meyer, Thoka Maer

As mentioned last link roundup, research isn’t the only area where essential digital infrastructure development in under- or un-funded. This report focusses on the situation in free and open source software generally, focussing on internet infrastructure, but some of the problems are the same: for instance

“Funders and infrastructure projects communicate differently.”
“A variety of factors prevent infrastructure projects from applying for funding.”

There are also very cogent insights on diversity and inclusion FOSS projects, which I think are very important but I also believe that research computing has diversity issues which are more deeply rooted and harder to bypass than in an open source software project.

Some of the recommendations are I think highly relevant

“Explicitly funding non-technical positions”
“Establishing fellowships”
“Providing examples of good practice for lightweight, result-oriented FOSS project structures”

The report is not overly long and a very clear read. What I’d like to see as follow up are recommendations on how FOSS infrastructure projects could advocate to funders.

Academic jobs take major hit from Covid-19 - Mićo Tatalović, Research Professional News

A reminder that trainees we work with are facing an even worse job market this year than usual for those looking to continue on the academic track. We’re pretty fortunate in that research computing jobs, particularly in anything connected to health sciences, continue to be offered in strong numbers.

Organic and Locally Sourced: Growing a Digital Humanities Lab with an Eye Towards Sustainability - Rebekah Cummings, David S. Roh, Elizabeth Callaway, Digital Humanities Quarterly

A useful article on setting up a Digital Humanities “pop up” lab in the University of Utah’s Marriott Library, after an earlier attempt had failed. The story told here of learning from (and building on) previous attempts and using the lab not simply at a thing in and of itself but as a concrete thing for a nascent cross-campus effort to nucleate around is a nice example of planning and community building to make something as tricky as an interdisciplinary centre take off. This article is part of an issue which has several case studies of digital humanities labs. The group putting this together fended off (or at least de-prioritized) administration views on what was important (visualization wall!) and focus on:

Real Academic Partnerships/Collaboration producing real outputs
People and trained staff, and
Figuring out how to let the lab identity emerge rather than be prescribed (the above partnerships helped with that)
Allowing individual things to be tried and fail while ensuring the effort as as a whole was sustainable, and
A portfolio of efforts and outcomes

It’s a good overview of what’s involved in putting together something that connects so many different moving parts.

I continue to watch how data science/data engineering roles evolve, because I think there’s a lot of analogies to research computing work specifically. The large amount of experimentation as different kinds of orgs take on and shape data science/data engineer teams can teach us a lot about how to usefully work with our own stakeholders.

A lot has been written on “full-stack” data scientists, and Eugene Yan feels that’s the wrong direction to go in. The important thing isn’t the depth of the tech stack, it’s the beginning-to-end of the journey. The data scientists who can participate in the process from identifying a problem to its eventual solution and deployment into production are the ones who can most easily contribute to the company’s needs.

I think this is especially true in research computing, and easy to forget when we’re increasingly specialized and focussed on our technical tools. Someone who can work with the researchers throughout the entire journey is invaluable. That doesn’t mean they have to do it alone, deeply understanding every technical piece of the problem, but having at the least a “concierge” or “navigator” who stays with a researcher team throughout the process is extremely valuable.

Research Software Development

Dev huddle as a tool to achieve alignment among developers - Mario Fernandez

Fernandez describes how to organize huddles for software developers. The huddles are somewhere that developers can raise ideas about new tools the team should consider, interesting techniques they read about, or make decisions about how they’ll be handling needed development work. They can be lightweight and self-driven, and serve as a method for building alignment between the developers and sharing useful information and knowledge.

Research Computing Systems

Findings From the Field - Two Years of Following Incidents Closely - John Allspaw Incident handling is an area where research computing falls well behind best practices in technology or IT, partly because the implicitly lower SLAs haven’t pushed us to have the discipline around incidents that other sectors have had.

And that’s a shame. There’s nothing wrong with having lower (say) uptime requirements if that’s the tradeoff appropriate for researcher use cases, but that doesn’t mean having no incident response protocol, no playbooks, no procedures, and going through the stressful and error-prone approach of making it up as we go along every time something happens is a good way to do things. And I’ve seen many research computing centres where that is precisely what’s done.

This is a short presentation slide deck on what Allspaw has learned from following incident handling closely at multiple organizations.

Some common failure modes he’s seen in leadership in thinking incidents are themselves a bad sign, wanting to get inappropriately involved, and an insistence on largely irrelevant metrics. Some common among front-line incident support is an exclusive focus on fixing over learning, and treating post-incident processes as bureaucracy and busywork.

In Allspaw’s estimation, both groups need to build culture and process around learning from incidents, creating meaningful actions to follow up on what was learned, and to make the most of these unplanned investments in peoples time by having the reviews useful, re-read, and having them inform future work.

Emerging Data & Infrastructure Tools

The HDF Group Announces Availability of HSDS Release v0.6 - HPCWire

HSDS (“Highly Scalable Data Service”), an object-store/S3-flavoured version of HDF5, is nearing v1.0. This takes the well-known scientific computing data format, with its efficient array slicing operations, and support for multiple readers and writers, and moves it to distinctly non-posix systems. For some applications this may be a relatively straightforward way to migrate away from POSIX file systems which are extremely expensive in the cloud and extremely challenging at scale. It will be interesting to see how this continues to mature.

Events: Conferences, Training

IEEE 2020 - 14-17 Sept, Virtual, Free

This year’s IEEE 2020 is virtual and free to attend, with workshop sessions in Intel persistent memory and ARM, and talks on efficient inter-node communications, performance monitoring and characterization, HPC workloads, and storage.

ParslFest 2020 - The Parsl Community Meeting - 6-7 Oct, Zoom, free

Parsl is a parallel dataflow/dynamic workflow library for python supporting a large number of back ends, including common HPC batch queuing systems. The 2020 Parsl community meeting is on the 6th and 7th and includes science applications, cyberinfrastructure talks, and tutorials.

Random

Research librarians are putting together “curation primers” for various research data file formats.

Unikernels, which I thought had promise for research computing in production (especially HPC) before seemingly getting killed off by VMs and containers, might be having a day again due to microservices. Nanos looks pretty slick.

Videos from FortranCon 2020 are available online.

The case against dynamic linking.

Systems software always struck me as having being like research computing software (say for simulations or data analysis) than application software in that the difficult part isn’t complexity so much as subtlety. Here is a blog post on writing comments for systems software.