#170 - 24 Sept 2023

Risk Management and Gratis Offerings Plus: Making connections; Understand the underlying dynamics before trying to Change Things; Hiring RSEs and Data Scientists; Influencing Stakeholders; Mojo; MLIR; We’ve disparaged waterfall unfairly; Research forges; Python + Excel = TLA; Overcoming Ops Debt

I’m back from summer break, recharged and refreshed - I hope you are, too!

Now that I’m back, I’ll be testing out a biweekly cadence for this newsletter, alternating with Manager, Ph.D; we’ll see how that goes. As always, I greatly appreciate your feedback; please don’t hesitate to email me at jonathan@researchcomputingteams.org, or schedule a quick call with me to give me input or ask me questions!

It’s fantastic today that there’s so many free (gratis) tiers of service and packages of software, open source or otherwise, that we can use as the foundations for the computing, software, or data services we offer to our research communities.

It really is! I feel that viscerally, because when I was coming of age in this community, proprietary and only barely interoperable OSes, compilers, libraries, resource managers, data platforms… were the norm, not the exception.

So I’ve never taken those free offerings for granted. It’s a fantastic development that I’m constantly conscious of. In some cases, it’s a vibrant distributed community that makes these offerings possible, with each contributor putting in largely volunteer effort off the sides of their desk, each person part of a massive international collaboration made possible by nearly ubiquitous internet access.

There are real downsides to putting so much responsibility on the shoulders of unpaid volunteer labour — maintainer and contributor burnout being a big one — but there are communities and packages where it works well and empirically appears sustainable.

In other cases, it’s a clear business decision, with a company freely offering a software package or service (often with some open source), with paid services or support built atop. The idea here is to fund the development of the open software or services (and then some) with paid offerings.

Either way, these offerings are great boons for those of us supporting research with computing, software, or data services.

But people’s time is (rightly) valuable and one way or another needs to be compensated. Similarly, equipment takes both money both for the equipment itself and for paying the people who operate it.

These are facts that we’re exceedingly aware of within our own teams, but too often forget when it comes to external offerings.

From the second tech boom (post the pets.com bubble) and then during years of low interest rates, money had been pretty easy to come by for tech companies. That’s meant that companies (or more properly their investors) could be pretty ok with high customer acquisition costs, like generous free tiers or open source software offerings; it meant that companies didn’t mind paying for some of their people to do open source development if it helped with recruiting; and it meant that other companies could be pretty ok paying for paid tiers of offerings even if they didn’t need it, “just in case”.

But now things have changed.

We’ve seen this with last year’s seemingly endless rounds of layoffs in tech companies, but it started before then. Docker no longer storing unused images indefinitely for free (#37) and then charging for Docker Desktop (#90); Heroku discontinuing their free tier; a number of CI/CD offerings chopping their free tier limits way back; GitLab said (and then partly walked back) that they were going to expire dormant repos; Google messed around with free Suite/Workspace plans; and of course we remember Binder’s struggles. Lots of small but key software packages have been archived, or are simply no longer being updated. This has been building for some time.

And while I was on summer hiatus, the latest RedHat licensing brouhaha erupted.

That Red Hat has put another bump in the path of generating free versions their distribution has a lot of people talking, because it directly impacts our systems and plans.

(A lot of that talking went along the lines of “IBM/RedHat is shooting themselves in the foot in the RCD community with this move”. My siblings in Science — why would a software or services company spend five seconds thinking about a market whose primary defining characteristic is a passionate refusal to spend money on software or services?)

There’s been very little discussion I’ve seen about anything closer to home, where we have much better knowledge and (more importantly) the very real ability to change things.

Crucially, only once or twice have I heard (and exclusively in private), “We had that risk logged; we already had some plans in place”.

In this newsletter I talk about risk registers and risk management principally in the context of projects. But it matters for operations, too. We’re entrusted to provide foundational resources and expertise for researchers who need data, computing, and software development. It’s our responsibility to be aware of, and consistently revisit plans for, events that could risk our ability to deliver those resources or expertise.

Given that since the late 90s, free (both libre and gratis) offerings have been so much the norm that its understandable that we’ve grown to take them for granted and not view them as risky (though maybe less understandable is surprise about this particular case, after all of the discussion when CentOS transitioned to CentOS stream so recently.)

But it’s always been true that depending on some other group to provide something to you for free has some degree of risk associated with it, which must be managed.

And yes, it is undoubtedly true that paid offerings sometimes also get cancelled without replacement unexpectedly. This has happened to me in both my professional and private life, and I have been Greatly Displeased each time. But we are professionals, and colleagues. Let’s not pretend to each other that this happens at remotely the same rate or with the same consequences with which free offerings disappear or fade away.

People who don’t want to think about this stuff will doubtlessly uncharitably summarize what I’m writing here as “He says we should all be paying for everything or we deserve what we get”. That is emphatically not what I’m saying.

Given what we do and the constraints under which we do it, yes we should continue to take advantage of free offerings and free software! It would be ridiculous not to.

The evolving environment does mean that we should be very conscious when we adopt such offerings in foundational ways, and make sure we have plans and alternatives should the offerings change. It also means we should revisit these plans and mitigations periodically and update when necessary. Mitigations could mean having backup plans including ideas for transitions; it could mean contributing to the free offering in some way (bug reports, PRs, etc.) so as to be seen to be a valuable member of the community.

Regardless of how we address it: as people with significant responsibility entrusted to us, responsibility that affects researchers’ ability to advance science and scholarship, it’s important for us to be aware of the shifting risk landscape around us.

We should be alive to and monitoring the risks of crucial dependencies in the services and resources we offer. That’s true whether it’s crucial team members leaving, the terms of a free tier being drastically altered, funders changing their compliance requirements, or anything else.

The fact those risks exist doesn’t mean we don’t hire great people, or never use free tiers of anything, nor take money from inconstant funders (which is all of them).

It just means we don’t take them for granted.

RCD shares academia’s greatest weakness — refusing to acknowledge that people’s time matters and has value, even our own.

That can lead to taking contributions from within our community, and from external organizations, as a given and as something that we’re entitled to.

The recent RedHat issues, when combined with other recent changes to our environment, will be well worth it if it reminds us that this isn’t the case, and that the responsibility entrusted to us requires a level of clear-headedness and professionalism around the value of things we rely on, dependencies, risks, and mitigations.

In other news, I’m toying with the idea of doing a “No-Nuance November” this year, during which the newsletter issues will be a little more… blunt. Let me know if the idea appeals, and if there are any requests for topics.

And again, I’d really appreciate hearing from you if you’d like to talk about problems you’re facing, things you’d like to read more about, or anything else. Please feel free to email me at jonathan@researchcomputingteams.org, or schedule a quick call with me to give me input or ask me questions! I’ll be quite responsive now that I’m back.

And now, on to the roundup!

Managing Individuals, Teams, and in Organizations

Over in MPHD #161, I wrote about the importance of the manager in making connections - from team members to tasks, resources, peers, skills, mentors, and others outside the team. In the roundup we covered:

“Puzzle-piece placement” as a mental model for hiring
“Sprint Zero” as a blend of project kickoff and building technical foundations
Avoiding blowback when giving or receiving tough feedback

In MPHD #162, I talked about the importance of understanding underlying dynamics of a problem before trying to solve it on your own, and introduced Michael Lloyd’s idea of dysfunction mapping. In the roundup we covered:

Research lab handbooks
Questions to ask at retrospectives
Managing stakeholders attention budgets
Writing good handover documents

Technical Leadership

Hiring, Managing, and Retaining Data Scientists and Research Software Engineers in Academia: A Career Guidebook from ADSA and the US-RSE - Van Tuyl, (Ed.)

This 129(!) page guidebook, put together by both US-RSE and the Academic Data Science Alliance, is a great comprehensive and clear distillation on the communities’ thoughts on hiring for RSE or Data Science roles. It should be a starting point for thinking about hiring, especially for a new team.

Strategy, Positioning, and Marketing

Navigating change part 2: influencing stakeholders and strategy - Michele Ide-Smith

Ide-Smith has worked as a UX designer and then product manager and service coordinator for Europe PMC at EMBL-EBI. In Part 1, she writes about introducing her team(s) to new way of working. Here she talks about influencing outward, to stakeholders.

In this post she describes her approach:

Provide data and evidence to build trust
Use open APIs and dashboards to showcase data
Make your work visible
Focus on impact
Show how to increase user satisfaction

(She also recommends the excellent book/site Impact Mapping, which a quick search suggests I’ve never discussed here? We should absolutely fix that).

The article is well worth reading, as are the resources she points to.

Maybe relatedly on the “provide data” and “make your work visible” pieces, over the summer break I wrote a comically long piece on gathering testimonials and success stories for our teams by interviewing researchers; it needs an edit but contains everything I’ve learned about doing such data gathering and presentation.

A related point, from the point of view of a internal platform team at a tech company, comes from this article, “Unmasking hidden value: leading Cost Centers”. Our teams generally tend to be seen as cost centres, as regrettable but necessary costs of doing business as a research institute. If we want to be seen as more than that, and in particular be seen as an area where more investment is not only necessary but actually advantageous to the institution as a whole, we have to be constantly demonstrating the positive value we bring, just as Ide-Smith recommends above.

Research Software Development

Mojo🔥 - It’s finally here! - Pramod Ramarao, Modular
Is Mojo the Fortran for AI Programming, or More? - Timothy Prickett Morgan, The Next Platform

In #167 I mentioned Mojo, a Python-compatable programming language that aims to be very fast for large-multidimensional-array calculations - something of great interest to AI, of course, but also to numerical computing. It’s now generally available, with free registration.

Obviously its success isn’t pre-ordained — we’ll see how it does. But given the interest in the AI/DL/ML communities it certainly has a significant head start, and it could be really interesting for more traditional numerical computing in our communities as well.

There’s a couple of takeaways suggested from the interest that it is gathering:

Between this and interest in Carbon as a next-gen C++, I think we’re starting to see limits to how well completely de novo programming languages can do in established domains. Enthusiast communities like those surrounding Julia and Rust can only take their languages so far in areas where there are huge installed software bases that just have to be interoperated with. (Rust’s inroads into embedded on the other hand benefits from the fact that embedded software necessarily has comparatively few dependencies). Other software development communities have been seeing this for ages — witness the last 10+ years of JVM-targetted languages, which dominated enterprise computing.
We’re probably entering, or will be entering soon, an era where the AI tail wags the research computing dog, not just in hardware but in software as well. If interest in AI remains at anything like its current levels, it will be calling the shots for programming language and library/framework development, and the rest of numerical computing will have to adapt and adopt those technologies.

Arguably related to the above, here’s a nice tutorial on MLIR for beginners. It may end up being a very useful target for other forms of numerical computing DSLs.

“Waterfall” doesn’t mean what you think it means: here’s why I think it’s far superior to anything that we’re doing today - Kris Brandow

This is a fascinating look at three key papers from the 50s, 70s and 80s which authoritatively described what became known as the waterfall method of software development, and I think everyone involved in big software development efforts should at least look at Brandow’s summary.

Basically, we’ve been outrageously slandering the original waterfall method for my entire career (which is not, at this point, a short period of time). The papers describe incredibly pragmatic approaches to software development, with yes multiple stages but with lots of iteration, with creation of prototypes in areas of high uncertainty, with clear-eyed risk assessment, and with lots of documentation and then more documentation.

Higher Education and Research Forges in France - Definition, uses, limitations encountered and needs analysis - Le Berre, Jeannas, Di Cosmo & Pellegrini

An interesting and thoughful overview of the roles software “forges” like github/gitlab/sourceforge(?) play in supporting research, as well as the pros and cons of commercial vs community vs national vs self hosted forges.

As is often the case, I think the costs of operating these kinds of (now very multi-functional) platforms at the level of availability and functionality we’ve come to expect is significantly underestimated, as is the costs incurred of fragmenting communities; but in the spirit of being aware of risks of dependencies, this is a timely discussion to be having.

Research Data Management and Analysis

Much has been written, mostly sarcastically, about Python now being a part of Excel. I actually think this is going to be a big deal for a lot of us who work with researchers that have established Excel workflows.

One of the huge problems in collaborating with them has been that there was no incremental way to move these workflows to R or Python. All you could do was export the data as a CSV at the end of an Excel workflow and then start a new workflow.

Python-in-Excel will let us get Python into their workflow earlier on, which I’m fairly optimistic about!

Has anyone tried this yet with an Excel-using researcher yet? Let me know.

Research Computing Systems

Overcoming Ops Debt - Chris Dwan

Dwan has a really nice article here about Operations Debt, the process and expectations cousin to technology debt.

Create a single front door for all requests
Have a person in charge for triage / dispatch (the person/process equivalent of the single front door)
Enforce this
Use that to be transparent about what current priorities and efforts is, so that you can be clear that one-offs are taking time away from what everyone agrees is a priority.

You can always tell that Dean writes based on hard-won experience:

Here’s the fun part: You’ve got to wait 3 to 6 months for people to stop hating the change.

Everybody wants change, as long as it’s happening to someone else.

When it affects them, human beings universally hate change, and all you can do is be politely firm and wait for people to get used to it.

Emerging Technologies and Practices

Podman v4.6 Introduces Podmansh: A Revolutionary Login Shell - Lokesh Mandvekar

I know a couple of teams that have tried to roll their own version with this with Singularity.

It can be useful for a number of reasons to be able to carefully control the level of host access even an authorized user has from a shell. Here Mandvekar describes a feature of the most recent podman release, which makes it easy to create and configure containers in which the login shell runs.

I continue to follow with interest stories of quantum computing programs being introduced into curricula or entire new programs, especially when they’re partnered with industry.

Random

An overview of the multithreading finally coming to Python 3.12 and 3.13.

A simple text adventure that aims to teach bash - bashcrawl.

I know many of you come here primarily for curated links explaining why using only a single space between sentences, as if a sentence break is the same thing as a word break, is wrong and weird and mostly an artifact of recent technology. I aim, as always, to serve.

Wordstar, which genuinely was a great wordprocessor, especially for those with a programming mindset (“reveal codes” let you see what was going on behind the formatting), seems to be having it’s day again. Here’s a wordstar clone, and here’s a wordstar-to-markdown converter.

FFTs to tackle the 3D knapsack problem, an approach which strikes me as extremely surprising.

Rendering a scene by entirely implementing a compound-lens film camera in Blender (video).

Rubik’s cube in sed.

I enjoy this genre of articles: “author asks what sounds like a simple technical question, realizes it isn’t simple, sets off to figure out the answer”. In this case - how pings go through NATs.

Detecting solar flares and gamma-ray bursts for less than $100.

That’s it…

And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.

Have a great weekend, and good luck in the coming week with your research computing team,

Jonathan

About This Newsletter

Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.

So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.

This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.

RCT Newsletter