#116 - 2 Apr 2022

'Contact us for pricing'; How to do less; Becoming a manager; Don't burn yourself out; Is your tech boring enough; Defensive programming; C++20 ranges and Python array strong types; Composable and virtual systems

“Contact us for pricing”.

Say we’re looking at a potential new tool, for us or the team as a whole. When we want to learn about whether it’s feasible as a solution to our needs, we see “Contact us for pricing”. Or occasionally, even for information about the product itself, “Contact us for details.”

“Fantastic!”, we think. “After a few emails and a call or two, not only will we get the information we want, maybe we’ll make a new friend!”

We of course do not think this, because no human being has ever thought this. More likely, we reasonably view it as an imposition on our time, an artificial barrier to getting the information we want. Most of the time we likely bounce, never looking at the page again. Sometimes, we may reluctantly agree to the process, and trepidatiously fill out the form. Perhaps we use a burner or obscured email address, perhaps we don’t. Either way we kind of resent it.

Because it’s not rocket surgery, right? When we’re looking for services or products, we want to know as precisely as possible what the service or product will do for us, and cost to us. Cost meaning both in money and, most valuable of all, our time.

For complex services, we realize we can’t know everything up front, but we still want to have as sense of the process and the timeline. If I leave a message for this plumber, will they call back in an hour or in a couple of days? Will they come to check out the problem tomorrow or a week from now? Will I have to be prepared for their visit at home for an hour, or the whole day? What are the steps going to be? If they won’t tell me up front, I will absolutely choose someone who will. And if it will take a month, well to heck with that, may as well watch some videos online and give it a shot myself.

So we know this, and yet at some level many research computing and data teams behave like they don’t really believe it applies to them.

When researchers come looking at a research computing & data team’s web page, they have some pretty basic things they want to know - what services do they offer? I have this problem - can they help me with it? Have they successfully solved problems that I view as similar in the past? If so, what is the process? What will it take of me, in terms of my and my group’s time, and expertise, and grant money? Am I looking at waiting an hour, a day, a month, a quarter?

These are not unreasonable things to want to know!

I’ve written about this for years:

Have you ever visited a restaurant webpage and needed like 4 or 5 clicks to get to the menu and their hours? If the restaurant took the menu off the website entirely and you instead had to file a ticket so you could ask specifically if they made spaghetti carbonara, that’s what most research computing centre websites are like for researchers.

and it hasn’t gotten better, even for very service-oriented groups like custom research software development teams. “Contact us for details on the product” would actually be an improvement; more often it’s “contact us to find out what we can do for you in the first place”.

These teams commonly feel discomfort at some level with thinking of themselves as a service organization. “We’re more like collaborators”, goes the thinking, rather than something as grubby as a “vendor” that solicits and works with “clients”. Collaborators don’t tout something as tawdry as a list of service offerings.

Such teams are reliably puzzled by how little is known about them in the larger organization, outside of the usual suspects of the teams they already work closely with. “I can’t believe they didn’t know we could do that for them…” - but how would they know, exactly? Where does it say that?

Starting to view our websites and communications as we would if we were looking for a service or product is an important first step towards growing our impact and supporting research more effectively. Starting to build a “service catalogue” or “product list” may seem like a big deal, but getting started is easy. All it takes is to pick one process your team gets a lot of requests about, that your team does often enough to have an informal “usual way”. Put something together. One page with the overall process, initial timeline, and some examples of successful outcomes for that particular thing you do - that’s all it takes. You’ve started. That one entry can clarify things to clients and internally about that process and offering, and iterate on the page. An SOP can be assembled. The catalogue can slowly grow outward from there.

Have you seen teams lacking this approach, or worked in one? How did you see them start developing and sharing some clarity about what they offer researchers? Was any particular nudge, from the inside or out particularly helpful? Let me and the community know - reply or email to jonathan@researchcomputingteams.org.

For now, on to the roundup!

Managing Teams

How To Do Less - Alex Turek
Four Steps to Organizational Change Without the Drama - Deiwin Sarjas

Turek walks through the steps of digging yourself and your team out of a hole via absolutely ruthless prioritization - picking exactly one priority and only advancing that goal. That means advancing it either directly through work on the priority, or through making work more effective by changing how and what work is being done.

The hardest part of doing this is the communications with others, and not letting yourself be sidetracked or buffeted by external requests to dillute your efforts. He helpfully gives useful lines to use, both for management and stakeholders:

This isn’t MAIN_PRIORITY, so we aren’t going to do it until at least ESTIMATED_DONE_DATE. Right now our priority is MAIN_PRIORITY because of ONE_SENTENCE_JUSTIFICATION, and this is 100% our shipping focus. I agree this sounds like a really useful feature - once we finish MAIN_PRIORITY, should we consider dropping SECOND_PRIORITY and do it?

and to the team, after you’ve discussed the new approach with them:

This isn’t MAIN_PRIORITY, that we all said we’re going to work on. What if we don’t do this? What can we do without it? Is this a requirement or a nice to have? Will it speed up MAIN_PRIORITY? Can we put this onto our (New Feature/Maintenance) Roadmap after our current priority finishes? I think we can finish MAIN_PRIORITY without this.

Turek also closes out his article with, almost in passing, an observation I don’t see made often enough. Small teams are often doing “iterate” work, making small changes rapidly to get feedback, and/or “invest” work, where there’s a big push needed to get a single something done. Standing up a system or creating an initial MVP is “invest” work, as is paying off a tranche of technical debt. New features/services or bug fixes/service improvements are “iterate” work. Turek says that at any given moment a small team should be in one or the other - trying to do both simultaneously squanders effort and confuses communications.

Sarjas also emphasizes the importance of communications when making a change (echoing Lara Hogan’s “Don’t YOLO the Comms”). He recommends an outline and process - Situation, Task, Intent, Concerns, Calibrate, or STICC - to organize the communications, and a roll-out plan with the example of promoting a team member to be a manager. You want communication of a change to result in people saying “Makes sense” and not “Wait, What!?”. Being thoughtful in your communications, keeping that goal in focus, can reduce unnecessary drama around big changes. Some drama may happen, and may even be necessary! But you don’t want to contribute to it needlessly through poor communications. As covered way back in #34, good leaders reduce chaos, they don’t add to it.

Making The Leap from Individual Contributor to Engineering Manager - Natalie Rothfels & Doa Jafri, Reforge
How Engineering Managers Are Set Up To Fail - Pat Kua

Speaking of promoting a team member to manager - Rothfels and Jafri talk about the challenges faced by an individual contributor taking on their first management role, and the skills they need to learn. I like their discussion of them, binned into five categories:

  1. Hone concise, clear, context-driven communication (including bridging communities and having tough conversations)
  2. Developing emotional regulation and self-reflection (someone can’t lead others if they’re constantly freaked out)
  3. Learn the craft of facilitation for both technical and non-technical discussions (the job now is to make sure the right conversations happen and go well, not necessarily contribute to the conversations)
  4. Create and maintain strong organizational systems (from #110, you manage processes and lead people)
  5. Shift your mindset around progress, ambiguity, and change. (This is a bit easier for those who have worked around research for a while!)

These are a lot of big skills to learn! New managers or team leads need a lot of support as they go through these changes. On the other hand, Kua talks about the main way he sees managers set up to fail. They are:

  1. Thrown into the deep end
  2. ..with no support, and
  3. ..with no push to break bad habits.

Honestly, the above describes the situation of pretty much every new manager in academic research computing, and many outside of academia.

Were you given any support, training, help, or peer support in your first research computing lead or manager job? What worked for you and didn’t? Let me know!

Managing Your Own Career

Why does burnout happen? - Ashley Janssen
know how your org works - Cindy Sridharan
Maybe you should do less ‘work’ - John Whiles

There are absolutely people in jobs that make unreasonable demands of them, whose job will chew them up and spit them out with burnout. c.f. the last two years in healthcare.

And, in research computing, I see a lot of smart high achievers choosing to take responsibility personally for unfeasible amounts of accomplishment with far too little resources. For pushing ambitious agendas through organizational structures that are at best unsupportive and at worst actively opposed.

When internal expectations and external possibilities are unrealistically mismatched, people can burn out. It’s not necessary, but it happens all the time. Janssen pushes us to take a cooler look at what’s possible, what you want, and whether the two are aligned.

To do that requires a gimlet-eyed look at the situation and organization you’re part of. Sridharan has good advice about knowing how your organization actually works and what it actually values. Knowing this will tell you how to effectively pitch projects, how to build relationships with teams you need to collaborate with, and how to demonstrate success so that you can get resources for the efforts you want to advance.

In the last article, Whiles talks about the importance of taking some time out of your day from work tasks to do exactly the kind of learning Sridharan advocates for. Developing organizational knowledge, learning new skills, and building cross-organizational relationships.

Technical Leadership

The Boring Technology Checklist - Brian Leroux
A database for 2022 - David Crawshaw, Tailscale

Is the technology you use boring enough?

I really like Dan McKinley’s 2015 talk, Choose Boring Technology, especially the bit where he recommends frugally and reluctantly allocating “innovation tokens” to use in part of a solution. Using shiny newness is expensive. It means constantly fighting against the unknown and solving problems you didn’t know you were going to have. It’s swimming upstream.

This is especially true in research computing! The researchers are solving a new problem, using a new technique. That means the underpinnings should be rock solid and reliable. If there’s something surprising in the output of a result, is that due to intriguing new behaviour in the system being studied, or is it brokenness in the researcher’s new method? Wherever it is, it shouldn’t be from new libraries or systems you’ve unnecessarily provided because you were really excited about using them. That stuff should be capital-B Boring wherever at all possible.

Leroux provides a checklist to ensure that the technology you are choosing is boring enough. Less cheekily, its a way of highlighting common downsides of choosing a new and cool-looking tool that doesn’t have the history, documentation, and well-understood workarounds of a solution that’s been around the block.

Cranshaw’s article describes the database choices made over time for powering Tailscale. Their approach goes even beyond “boring”, having a very strong preference for “as simple as possible”. It turns out that’s frequently simpler than you’d think! For 18 months, and a few orders of magnitude in growth, their “database” was a JSON file. Then they used etcd. They are now powered by sqlite, using litestream for streaming replication. As you know, this is principally a sqlite/embedded db fan newsletter, so Cranshaw’s article getting mentioned was a foregone conclusion.

Research Software Development

Defensive Programming and Debugging - Geert Jan Bex, Mag Selwa, Ingrid Barcena Roig

This is an online book to go alongside a PRACE MOOC on defensive programming and debugging specifically for C/C++/Fortran HPC-type codes. The Five-week MOOC has a very modest cost, but the book is completely free. Covered is coding style, error handling, unit testing, using the compiler to help, and debugging tools and techniques. The book is a work in progress and they’re looking for contributions.

In A style guide for creating new backlog tasks, the author recommends a couple of simple and clear style guide recommendations for new backlog tasks. The first is that the title should be an outcome-focused, actionable sentence (“Update copy on signup screen, “Investigate ways to increase speed of CI for each build”). The second is to keep the description relevant to the audience the ticket is for (e.g. distinguish between a product backlog and a developer’s ticket), and keep it clear.

What else should go in a backlog style guide?

A plea to make units explicit in parameter names. One of the few things I learned in my physics training that I still routinely use is dimensional analysis. Units matter!

I’m fascinated by where the typing work in Python is going, and in particular Variadic Generics aimed specifically at NumPy. Making the type and shape (not just rank!) of an array allows static checkers to go Fortran-style deep into function calls, which is amazing. That connects to the new PyData Array API standard to allow some pretty cool things (including GPUs, yes, but also other accelerators)

C++20 range composition with the pipe operator is also pretty exciting! Functional programming-like composition in C++!

Research Computing Systems

Thread Alert: First Python Ransomeware Attack Targeting Jupyter Notebooks - Assaf Morag, Aqua
GitLab addresses critical account hijack bug - Adam Bannister, Daily Swig

Two serious software vulnerabilities in packages relevant to reaserach computing teams - self-hosted GitLab, and Jupyter notebooks.

A lot of RCD teams have Jupyter notebooks easily set up for labs with inadequate authorization, because it’s just a notebook, right? The ransomware attack is a surprise - I would have just assumed someone would do crypto mining. The Aqua team set up a honeypot and watched an attack in progress if you want the details.

The GitLab bug really requires the interaction with OmniAuth, resulting in hardcoded passwords set and credentials taken over.

Emerging Technologies and Practices

Putting Composability Through the Paces on HPC Systems - Jeffrey Burt, The Next Platform
Will Open Compute Backing Drive SIOV Adoption - Daniel Robinson, The Next Platform

Burt reports on two efforts in Texas, where Liqid (for NSF’s ACES) and GigaIO (for a Lonestar6 pathfinder) are building composable computing prototypes or testbeds. These are disaggregated systems of network, compute, storage, accelerators, and persistent memory which are tied together by extremely high speed fabrics, so that you can compose combinations of them as needed. Kind of software defined nodes.

The idea here is to have cloud-like flexibility in instances. Rather than having a whole bunch of nodes of one or a small number of kinds that users have to figure out how to fit to their job, the idea is to fit the hardware to the workflow at runtime.

This has been tried before - Infiniband was partly conceived as an off-box PCI, decomposing server hardware onto an IB fabric. But networks are now much faster than they were in 1999, when 8Gb/s was a big deal but clock speeds were already reaching today’s plateau. If this ended up being a feasible approach, it would make on-premises systems much more attractive for the highly-mixed workflows of research computing.

Related to these efforts is device virtualization. Robinson’s article talks about Intel and Microsoft’s Scalable I/O Virtualization (SIOV) specifications, donated to the Open Compute Project, for faster and more scalable virtualization of devices. If you have some accelerator or persistent memory device on one of these composable systems, you may well want to allow two different ‘nodes’ to pick them up, with two different virtual machines running on them. If that’s the case you’ll need some way to partition them (which is an internal-to-the-device problem) and expose them as separate (which is external). That then allows them to be shared in a way that’s less prone to noisy neighbour problems, and better for security. It’s that second issue that SIOV, and the current standard for PCIe devices, SR-IOV, aims to solve.


Well that should do it: RFC 9225, released on Friday - “Authors MUST NOT implement bugs”.

Someone brought OpenZiti, a suite of libraries and tools for zero-trust (ZT - get it?) overlay network and tunnelling. Do you know anyone who’s used it in a research computing context, presumably for sensitive data and systems? Would love to hear of any use cases.

Run Doom in your Grafana dashboard, for some reason.

Add an sshd to your secure web server and proxy, for some reason.

This is interesting - Amazon has a postdoctoral science program.

In praise of learning APL.

A deep dive into unwinding a stack and frame pointers using the Linux kernel community’s new ORC debug format.

A song composed of nothing but dial-up modem samples.

If your work uses Google Workspace for mail/calendar/etc, you’ll now have calendly-style booking options built in.

Fascinating network debugging story, within AWS, that seems to be a chef or linux issue but ends up being about MTU discovery and AWS Transit Gateways.

Something I hadn’t really considered before is how Python’s byte code makes possible a number of very accessible introductions to compilers. Here’s an tutorial on building control-flow graphs from a subset of Python.

That’s it…

And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.

Have a great weekend, and good luck in the coming week with your research computing team,


About This Newsletter

Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.

So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.

This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.

Jobs Leading Research Computing Teams

This week’s new-listing highlights are below; the full listing of 134 jobs is, as ever, available on the job board.