#130 - 16 Jul 2022

'Strategy', 'Tactics' Considered Harmful; In praise of process; Enlisting advisory boards; There will always be more problems; Semi-synchronous conferences; FLUTE for federated learning research; Buy and lease, not cloud vs on-prem; Rocky linux build infra

Hi!

I want to write one more thing about strategy this week before going back into the basics of managing our research computing teams.

It turns out I’m still a bit discouraged by one section of the recent (very good!) PEARC21 report on strategic planning for RCD groups. That part described all too clearly the confusion and helplessness many team leaders of teams like ours feel when asked to do some positioning and strategy work. Like so many things, we’re thrust into responsibility for these kind of activities without ever being taught anything about them. It’s intimidating. But it shouldn’t be, and doesn’t have to be, that way.

The thing is, finding a path forward for a team or organization is a research and problem-solving exercise.

There’s an outcome the team wants to achieve, and some problems are preventing it from achieving that outcome. First one has to research the problems and the current condition, and figure out how to make use of what’s available. Then one chooses a solvable problem, and comes up with an implementable solution that gets one from the current point, through the problem, closer to the outcome.

We are people of science. Whether we trained in science or not, we work daily now to support science, and solve complex problems. We are good at researching situations and coming up with creative solutions to problems.

But when the discussion turns to “strategic planning”, all the needless puffery surrounding the topic drains us of our confidence and dulls our instincts. There’s these important-seeming words with shifting meaning: strategy, objectives, tactics, pillars, goals. It’s disconcerting. Is this something I should already know? When does something stop being a strategy and start being a tactic? Wait, that’s a goal, I though you said it was an objective? Do I even belong in these conversations?

None of that matters. Words are tools to aid us. The purpose of that jargon is to help us by providing a shared mental model. If they’re not helping – and in many RCD planning conversations they are an active hinderance – then set them aside. Revisit them later, if you want. Those words aren’t what’s important. What’s important is the collaborative problem solving exercise.

Focusing on words instead of problems, form instead of function, leads to superficial discussions with forgettable results. It leads people to search for things that seem like would help, but don’t, like templates for strategy documents. Documents describing future directions are statements of a chosen problem and a proposed solution. There’s no “problem and solution template”. Solutions to problems aren’t uncovered by filling in the blanks on a .docx file downloaded from a website. Any such template constrains thinking rather than amplifying it.

Three sentences on the back of a napkin that actually help a team understand more clearly what it should be doing are worth infinitely more than ten pages of glossy fluff with the “right” section headings. The only purpose of a plan is to provide clarity for you and your community. It’s completely irrelevant whether someone outside that community thinks your document looks “official” or “sophisticated” or “strategic” or something. The goal is to have a final write up that makes you think, with some genuine enthusiasm, “Yes! This is the way forward, this solves a real problem for us”. Focussing on what an important document “should” look like leads to official-looking but wasted paper.

Creating clarity about a future that involves many people is hard. But working in science has taught us anything, it’s how to solve hard problems while collaborating with others. If concerns around form and format are distracting you at all from the work at hand, ignore them. You can always add the “right” words and structure later if you like, once you’ve done the meaningful work, the problem-solving work. But the work is for us, not to satisfy some imagined style guide.

Anyway, I’ve talked about strategy enough in the last couple months - back to the basics next week, I promise.

Are there other topics you’d like to see? Discussions you’d like raised? Questions you have? Please email me by hitting reply or sending an email to [email protected], or start a conversation on the #research-computing-and-data channel on the Rands slack.

Until then, on to the roundup!

Managing Teams

Technology isn’t the Hardest Part - Jonathan Dursi (me)

Earlier in this week, I had the really great opportunity thanks to some readers to attend and give a talk at the Bioinfo-Core session of ISMB2022. This is a long-running group of bioinformatics core facility leaders that meet routinely to discuss topics like emerging technologies, computational demands, managing specialized teams, funding models, and the like. The meeting and discussions were fascinating and useful.

I’m fascinated by bioinformatics core facilities because on average they are well ahead of a lot of other research computing and data teams in terms of having a product and service orientation, and thinking carefully about operational and fiscal management. They also often combine some element of research computing systems, data science and data management, and software development expertise. Honestly, I tend to think of them as a view of a possible future of RCD in general - domain-focussed, deeply immersed in the science, and cross-functional. But they also face the same problems the rest of us do.

In the session I gave a bit of a whirlwind 20 min talk on the very different balls we have to juggle managing our teams - people, processes, products, and potentialities (the future; maybe “positioning” would have been a better p-word). I claim that our scientific mindset and expertise can be a huge help in those four very different problem domains, if we apply them to how we work not just the subject matter stuff. With people, as I’ve written before, we have the advanced collaboration skills from scientific projects, and typically just need help with the basics. With processes, we know all about reproducible protocols, and just need to apply them more widely. With products, we can lean on our experience bundling work product and expertise up into digestible chunks like papers and talks; and with positioning, we just need to apply the same focus we see successful PIs have. And of course, experimentation for testing hypotheses and changes is valuable everywhere.


Process has a bad rap - Paulo André

That’s right, I said it in the talk above - in public and everything. I’m a big fan of documented processes. Not processes for their own sake, but processes that already exist but aren’t documented anywhere.

But! “Process” gets a bad rap, especially in our line of work - research institutions tend to have accumulated over time huge lists of bureaucratic processes that stifle getting work done, instead of helping by making it clearer how to do things.

André talks about how to avoid that - processes should have clear purposes, documented with the process - maybe even purposes for each step. Then it’s clear why things are the way they are, and if the situation changes, they can change too. To help that, the processes should explicitly be owned by someone - preferably in the team doing the work - so it’s clear that they can be changed. Otherwise things become too hard to change, and when they can’t change easily they get stagnant. Finally, processes shouldn’t be more prescriptive than they have to be.


Technical Leadership

You will always have more Problems than Engineers - Matt Schellhas

A reminder from Schellhas that no matter the size, level in the organization, or funding levels, there’s never enough people to do everything good and useful that could be done. That means, yes, leaders who ruthlessly prioritize are key. But it also means training team members to prioritize. Applying technical judgement to possible activities is a crucial skill for everyone, from juniors up, and the sooner they loon it the stronger they’ll be.


Working With Decision Makers

Building a Board of Fundraisers - Nonprofit Hub
“Plaque and Sack”: The art of getting rid of terrible board members while making them feel appreciated - Vu Le, Nonprofit AF

A lot of our teams, especially at Universities. have something like a “scientific advisory board”. And many of those groups are…. well, let’s be honest, kind of useless. That’s partially on us. It’s easy and comfortable to slip into a cycle of benign mutual neglect. Changing that is scary. Asking them to be more involved… well, what if, all too believably, they tell us to start doing the wrong things?

But such groups can be powerful allies. They can provide input on and validate high-level direction, reach out to their communities for us, and advocate for funding for our group whether from funders or institutions.

Those of us with those sorts of groups can learn a lot from from nonprofit leadership and how they work with their boards. Back in #61 there was an article on board reports and using those to have engagement at the level they can most help us. Here we have two articles urging us to help shape our “boards” to play a strong, constructive role in our teams - which is helpful for us and more satisfying for them.

The first is simply to emphasize fundraising when developing and maintaining the advisory board. Fundraising is very different for charities than academia, but the basic principle - board members should be advocating for funding our efforts and finding opportunities for us - is the same.

The second is that board members who have stopped being helpful contributors should be gently offboarded to make way for those who can help. (Le has a… very specific writing style).

Implicit in the two articles above are that the board should have terms and clear ways of bringing new people on and having people leave, as well as regular meetings with business to attend to. This is important; it’s way too common in our space to just hard-code a few names and then not give people anything to do. When setting up such a group, a nonprofit board is an excellent model, with (simple) bylaws, expectations, meeting schedules, and staggered terms.


Science Policy, Funding, and Research Computing & Data

Harvard Debuts New University Research Computing Organization - Oliver Peckham, HPC Wire

Peckham goes out and interviews some key people and reports the details of Harvard’s new University-wide Research Computing Organization - in #128 we only had the LinkedIn post of Scott Yockel, the head of the new office, to go by.

Hopefully we’ll see more institutions move this way and away from the fragmented teams scattered around campus. Having Harvard’s imprimatur on this kind of effort will hopefully make other institutions start thinking about it. Obviously, having Ivy League money to throw around helps get it done faster - Yockel’s org has $2.8 million in internal funds to create an integrated environment - but being cash-strapped makes it more, not less, important to bring small pieces together into a coherent whole.

Importantly, the RCO will include data management, computing systems, and software development.


Ex-Google chief’s venture aims to save neglected science software - David Matthews

A somewhat longer, more detailed, followup on Science’s initial article (#106) on what’s now the four-university “Virtual Institute of Scientific Software”, funded by Eric Schmidt’s “Schmidt Futures” foundation. Touched on is a real advantage of having a larger critical mass of expertise:

What’s more, software engineers tend to be scattered and isolated across a university. “Peer development and peer community is really important to those types of positions,” says Stone. “And that would be extraordinarily rare in academia.” To counter this, VISS centres hope to create cohesive, stable teams that can learn from one another.

Again, it would be awesome if we didn’t have to rely on patrons from the tech industry to get some stable funding for maintenance of key pieces of research software:

Meanwhile, the University of Washington has been “flooded” with requests from researchers there who say they need engineering help, says Sarah Stone, a data scientist at the university who is helping to coordinate its VISS centre.


Product Management and Working with Research Communities

Twitter thread explaining how the highly asynchronous Cosmology From Home series of conference (just finishing it’s 4th installement) is run:

  • It is not intended to be a replacement for in-person conferences
  • Prerecorded talks, of a given length but any format, are distributed ahead of times
  • Dedicated slack channel per talk with the speakers monitoring
  • Talks can be advertised at any time for a more synchronous watching experience for those who want it
  • Emphasis is on discussion, including suggested themes

I’m really looking forward to a future where there’s a variety of well established conference formats, synchronous and asynchronous, in-person and remote, that conference and workshop organizers can pick-and-choose from to achieve particular ends.


Fun paper (and some highlights from the paper) on Amazon’s DynamoDB, tracing out ten years of evolution of a very focused (and highly used) software product, DynamoDB. Like Python Tutor in #127, these longitudinal papers that trace out the path of a product over a long period give a fascinating glimpse of the mindset in driving an effort over time.


Research Data Management and Analysis

FLUTE: A scalable federated learning simulation platform - Dimitriadis, Garcia, Diaz, Manoel, and Sim, Microsoft Research

As research computing starts getting more and more involved in sensitive data - not just health, but social sciences, too - privacy-enhanced methods of analysis are going to become more and more important. So I’m really pleased to see Microsoft Research come out with an open-source framework for studying and experimenting with federated learning, FLUTE - there’s a paper as well. Unlike say Flower, or (disclaimer: I work at NVIDIA) NVIDIA’s FLARE, which are meant for production use, this is meant for research into federated learning. But having well-designed testbeds for developing and prototyping federated learning methods is vitally important if we’re going to put more federated approaches into production.

There’s a lot to like here - because it’s for experimentation, it’s really aimed at running on a single cluster (it uses MPI internally), can scale up to thousands of clients, has support for tonnes of data types, models, data sets, and optimizers, and has an expandable plugin architecture.

FLUTE client-server architecture and workflow. First, the server pushes the initial global model to the clients and sends training information. Then, the clients train their instances of the global model with locally available data. Finally, all clients return the information to the server to aggregate the pseudo-gradients and produce a new global model that will be updated to the clients. This three-step process repeats for all rounds of training.


Research Computing Systems

Rocky Linux 9 arrives with everything you need to replicate the distro on your own - Steven Vaughan-Nichols, ZDNet

Rocky, the new RHEL clone intended to replace CentOS, is out with their version 9 - but more importantly, so is their build system for creating releases from the upstream. Not only will having proper automation around this make release turnaround times, which is already a good thing. This also means the work continue if something falls apart in the governance of this effort, and that it will be much easier for small efforts to create other custom distributions from Rocky or RHEL.


Webinar: Cloud Versus On Premises Architecture for Life Sciences (YouTube link) - Ari Berman, Bioteam

Berman, whose organization has spent a lot of time working with wide range of computational life sciences teams in academia and industry, gave a webinar on “cloud versus on premises”. It’s a good talk, though I got a bit salty about one of the slides.

But I’d like us to move past this debate. Right now, AWS or GCP will deliver their cloud hardware into your data centres to run there, if you want. Various commercial software can be subscribed to to manage infrastructure control. Hardware can be leased, bought, sold back. If your data centre is a co-lo, so the premises aren’t yours, is it really on premises? And…

There’s a whole spectrum of options available today, and our community is still debating “on-prem vs cloud” like it’s 2012. It would do us and our researchers well to have more sophisticated discussions. The question isn’t “on-prem vs cloud”, it’s what should be bought outright and what should be leased for a given workload mix.

Here are some real options for moderately-sized teams charged with delivering the capabilities of a system to users right now:

  • Buy the hardware outright, install all open-source software,
  • Buy the hardware outright, install a lot of “rented” commercial software (or commercial support of open source software)
  • Buy some hardware outright and install something like Azure Stack, renting a lot of the infrastructure software and control plane
  • Lease the hardware, with any of the software options above
  • Host any of the hardware options above in a datacenter you own, “renting” the data centre operations staff’s time (cooling, power, internet, physical security..)
  • Any of the hardware above in a leased space where you control and have to manage the utilities, renting the same staff time
  • Any of the hardware above in an an institutional or commercially-managed space where you are now “renting” the space and their data centre operations staff
  • Rent medium-term reserved hardware at a cloud provider, combining data centre ops, some systems ops, and the hardware for moderate periods of time
  • Mix of the above plus short-term “renting” of on-demand or shorter-term contract nodes/services
  • All short-term
  • “Hybrid”, which just means a mix of any two or more of any of the combinations above

It’s not “A vs B” any more, and it hasn’t been for ages. We have a widely diverse research community with its huge range of use cases to support, and we need to think in more sophisticated ways than obsolescent binaries.

(And here’s a controversial take: You know who in our institutions are really, really smart about leasing versus buying for big capital needs? Who have taken, like, whole postsecondary courses on the topic? The folks in the finance department).


Emerging Technologies and Practices

While the quantum tutorials review last issue in #129 was more about a study of pedagogy, Chiara Decaroli’s Guide to Online (Mostly Free) Resources for Learning Quantum Computing gives a list of resources for the learner of those wanting to learn more about quantum computing. Readers of this newsletter are probably interested in the “With Math” sections.


Random

Make fizzbuzz. No, not “compile a program to run fizz buzz” - fizzbuzz implemented with GNU Make and seq. Followed up by one entirely in GNU Make.

How slack bots and chat ops can almost go terribly wrong together.

I love performance debugging stories - here’s one where memory zeroing in parallel was dramatically slowing things down.

And here’s one puzzling out why two identical SQL queries differed in speed by a factor of almost 6.

You can “dig uk” or “dig ca” but you can’t “dig ch”. Why can’t you dig Switzerland?

DARPA is trying to figure out how to secure open source.

It’s perfectly normal for a build system to require thousands of words to explain how a tiny but crucial piece of it works, right? Anyway, here’s part multi-part series into how caching works in Bazel.

Tail call optimizations in webassembly.

A great walk through with nice visuals of Delaunay Triangulations.

Research computing is fun and weird. Here’s a Rust package for estimating the size of a critter that pooped.


That’s it…

And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.

Have a great weekend, and good luck in the coming week with your research computing team,

Jonathan

About This Newsletter

Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.

So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.

This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.