#123 - 28 May 2022

Research Computing Teams are Vendors, Too; The Manager's Handbook; The Turing Institue Guides; What teammates owe each other; When everything's important; Open Source Communities; HTCondor Product Management; Practical Guide to Big Research Software Estimation

Hi!

My new job, working for a large company that explicitly sells stuff, has indeed been eye-opening — but not in the ways I expected. Mainly I’m confronted with clarity about some of my old jobs, and peer groups I’ve worked with.

I’ve always been sort of puzzled by the decisions made and priorities chosen by some research computing and data teams. Those teams seemed more… insular, somehow, than others. I saw it less often in contract research software development teams, or library research data management teams, or bioinformatics core facilities. I saw it more often in research systems teams — not all, of course, and not only.

Meeting with some of these old peer groups, wearing a vendor hat, and watching them interact with us, it becomes a lot clearer.

“The Sixth Sense” meme: “They don’t know they’re vendors”.

The thing is, in this field, We’re all vendors. But not all groups know it.

Thinking otherwise is a completely understandable trap to fall into. It’s especially seductive for people trained as academics who have stayed in the University. I fell for this early in my research computing and data career: “I’m part of the university/research institute, collaborating with my peers for free, same as when I was a postdoc”. Conveniently, this avoided me having to focus uncomfortably on my shift from a researcher to a research support role. It also meant that I could avoid making hard decisions about which research to support how, and why. “Collaborations emerge organically, after all!”

But in Research Computing and Data we are all very much vendors, offering support and services (yes, including expert collaboration) to research groups who have choices about where and how to do their work. We may well have an extensive working relationship with a group and deep knowledge of their needs. They will still take their work elsewhere if they feel it would better advance their research. And they’d absolutely be right to do so.

Research group or RCT, we have the same mission — to advance research and scholarship as best we can. But our roles are different. The researchers know their work best and how best to advocate for it. If they choose to take that work elsewhere, that’s what’s best for the project. It’s our role to make it clear how we can (and can’t!) support the research with our services, and make available the resources necessary for that project to succeed with those offerings.

Sometimes the match won’t work. The researcher will decide their project or programme will best succeed using other team’s offerings. As long as that decision is an informed one, that’s success! A better match was found to advance the research a bit better. That’s our mission. We showed what we could do, and they found another option. Our teams don’t own, aren’t entitled to, research groups or projects.

We, too, can and should “opt out” of a match. Maybe the programme isn’t a good match to our offerings. (Perhaps we even recommend another team!) Maybe the research group isn’t ready to collaborate with us yet. Or perhaps it could be made to work, but it isn’t the right choice for us. It would take too much resources to support the effort well, and our mission — advancing science and our organization’s priorities — would be best served by allocating those resources elsewhere. Recognizing that is also a success. Particular research groups or projects don’t own us, either.

Matchings or mis-matching can both be failures. And that has consequences for how we should lead our teams. Failures include:

  • The match didn’t happen because the two sides (‘vendor’ and research group) didn’t know about each other, or understand the other’s needs or offerings
  • The match happened and the project failed/sputtered out because there wasn’t enough or the right effort put in
  • The match happened and the project was successful, but required far too much effort put in that could have been better spent elsewhere
  • The match happened and the project wasn’t as successful as it could have been because the services didn’t match the project as well as was thought.

So yes, we’re vendors. We’re “selling” open-source software, custom software development, data management, or systems services… or commercial equipment or software support. To best advance science and our institutional priorities, we should working hard to make it clear what we can offer, what we can’t, and directing researchers and scholars elsewhere when when that’s what’s best for them or us. That means listening and “marketing” and accepting that we’ll often hear “no” (even when they could make it work) and that we will say “no” (even when we could make it work).

In a way, this realization that we’re vendors makes us more like successful research groups rather than less. The most successful research groups know that there’s a zillion projects they could work on, questions they could ask given infinite time. But resources and time are finite. So they laser-focus on the areas with of funding available, skills available, and high impact, where they can best advance research. They communicate their capabilities widely to attract collaborators, advocate for those projects, get used to hearing “no”, and turn down projects they could do, but won’t. They’re specialized, focussed, and relentless advocates and communicators. They also run highly effective teams.

With that, on to the roundup!

Managing Teams

The Manager’s Handbook - Alex MacCaw, Clearbit

This is a really solid, free handbook for new managers or people thinking of becoming managers. Some things I particularly like about it:

  • It covers likely failure modes right from the beginning
  • Lots of emphasis on hiring
  • It covers managing yourself early on - your behaviour and your mindset are the only things you really have any control over, and I think this is under-addressed in other resources
  • There’s a distinction implicitly made between the behaviours you need managing individuals (one-on-ones, coaching, feedback) and managing teams (working as a team, conflict resolution).

I don’t love everything about it - it includes things I’d not, and doesn’t include things I would - but it was made for a particular company’s culture, not mine. It’s a thoughtful and solid resource to have to hand for pointing people to, building on for your own organization, or to read (it’s always good to revisit the basics).


How to Respond When an Employee Quits - Rebecca Zucker, HBR

This is one of the situations where you see how far a new manager has come on the “managing yourself” skills. It feels like a disaster, even a betrayal, the first time a team member quits. It isn’t either, of course — it’s good and healthy or people to move on, and is an opportunity for the team as well.

The only correct initial response when a team member tells you they’re quitting is something along the lines of “I’m sorry to hear that, but congratulations!”. As Zucker points out, that isn’t easy, but it’s necessary. First because you may work with that person in the future, or have opportunities to have their friends and colleagues join your team. Second because it allows you to productively move to other important parts of the conversations, like asking for what you and the team needs before they go, and possibly learning about things you could improve retention.


As we’ve said before, a team is a group of people that hold each other accountable. But for that to be possible and effective, there have to be shared expectations. In What New Teammates Owe To One Another, the team from Nobl has a suggested onboarding document that new team members are walked through of team expectations. (Obviously this has to be hashed out with existing team members first!)


Technical Leadership

When Everything is Important But Nothing is Getting Done - Roman Kudryashov

Kudryashov walks us through a case study of getting a team unstuck:

The last company I worked for was a mid-stage startup with growing pains. What had started out as a nimble organization able to create impressive software now felt stuck. Everything was high priority, nothing ever seemed to get completed, morale was low, and it was starting to coalesce into a learned helplessness where the only solution seemed to be resignation…

You, gentle reader, and other long-time RCTers won’t be surprised at the core elements of the solution — ruthless prioritization and reduction of work in progress, activities dropped entirely, one project being worked on at a time, and a clear definition of done. But knowing the solution is is the easy part. Kurdyashov’s article spends a lot of time on the (hard! time-consuming!) other part: getting to the point where the solution is possible.

Like any big change management effort, the key factors that lead to success include driving a consensus that there is a problem, and that to address it some very big things are going to have to change. (Unfortunately, it’s too easy for people who should know better to fall into the sentiment of “I want things to get better, but I don’t want to change anything.”) Some parts of the problem of too-big projects or everythings-top-priority can come from elsewhere in the organization, and then those are people who have to be part of the consensus.

And of course there has to be contininual followup. The natural state of work is not clear focussed work on widely-agreed-upon priorities. Instead, if organizations are left to themselves, entropy will build up and teams will find themselves in the same situation again. But as Kudryashov describes, it’s worth all this hard, deliberate, on-going work:

It took roughly six months to make this transition and another three months to continue refining the process, but we were in a good place. Projects were unblocked. We delivered two major time-sensitive contracts… on time, and with historically low defect rates. Morale was up across multiple teams, which reflected in better satisfaction scores on employee surveys and more importantly on a radically reduced turnover rate (we went from a 50% turnover rate per quarter to something like 4% quarterly turnover, including zero turnover one month).


Product Management and Working with Research Communities

Uncurled - everything I know and learned about running and maintaining Open Source projects for three decades - Daniel Stenberg

Stenberg, best known for curl, has a great book on the project and product management of a successful open source software product. There’s very useful stuff in here for those hoping for their open source product to take off. Some of the points I find particularly valuable:

  • Just do it
  • The project is “we”
  • If it’s not alive, it’s dead
  • Newcomers can be awesome
  • Contributors will not stick around
  • Over time, maintenance grows
  • Volunteers make things different
  • Only releases get tested for real

How to Build an Open Source Community - benny Vasquez, The New Stack

Overlapping with but distinct from Stenberg’s article, Vasquez talks more about the governance of setting up the community. Vasquez describes personas for possible levels of engagement you’ll see, engaging the right people early on, creating the culture and processes you want to see, and how to empower community members.


HTCondor Week 2022 was this week, and I don’t think it’s often enough commented on what an excellent job has been done managing it as a product over the last 34 years (!!) when it started as a cycle scavenger. As was outlined in Miron Livny’s talk,

What began as a hunter of idle workstations is a now a manager of HTC workloads. “It’s not about the capacity anymore, it’s about the management of the workflows”.

That shift - from cycle scavenger to high-throughput computing workload manager - is a remarkable one, and too many research infrastructure efforts would have clung on to the original mission, eventually fading away into oblivion.

As a sign of the success of this approach, there were stories highlighted this week of people bringing their own resources - e.g. they didn’t need to scavenge the cycles, they had pre-existing dedicated resources - wanting help setting up HTCondor. They wanted to move to using HTCondor up because it was a really nice workflow management tool for high throughput computing with good researcher experience. That’s a remarkable product management success.


Research Software Development

A practical guide to research software project estimation - Chase Million

We know that waterfall-style, “design everything at the start” project management doesn’t work for research software. Unfortunately, the way most research software development efforts are funded, we kind of need to do that anyway.

Funders who are going to shell out $500k+ for an effort want, understandably, to see a plausible plan that makes them confident of a reasonable likelihood of success. Also understandably, they’re not be overly concerned if the plan doesn’t play out as predicted — this is research, after all. And putting together such a plan, as Million points out, is a great opportunity to bring the relevant stakeholders together to hash out a consensus on what the right thing to build even is and what its scope should be, and whether you already have the right people you need. Even modern agile practices almost always start large efforts with a big kickoff meeting where similar topics are discussed.

Million gives a good, practical, overview of how to plan out a large, multi-stakeholder research software project. The document is worth reading and/or circulating to novice stakeholders in advance of a grant proposal development meeting. The process the document describes naturally produces, as outputs, a consensus on what’s to be done and the kinds of rough-and-ready project planning documents that a funder will want to see. It’s quite good, and I haven’t seen anything as comprehensive.


Stripe is widely known within tech for having excellent API documentation. They’ve open-sourced MarkDoc, an internal tool for generating rich and nice-looking documentation pages using augmented Markdown syntax. MarkDoc is new but it’s already attracting users. As someone who always found reStructured Text powerful but confusing, this seems really interesting.


Julia is a really exciting language with a lot of advantages for research computing - people make amazing DSLs for things like differential equations using it. But this article by Yuri Vishnevsky describes some of the downsides that I’ve seen - inconstant product management leading to a culture where serious correctness bugs or other brokenness can persist.


Interesting - Intel as a CUDA-to-SYCL conversion tool.


Research Data Management and Analysis

Desirable Characteristics of Data Repositories for Federally Funded Research - White House Office of Science and Technology Policy (OSTP)

So the OSTP’s Subcommitte on Open Science and the National Science and Technology Council have put together guidelines for data repositories for federally funded research (does anyone who understands US science policy know why this was done at this level and not by the granting councils as elsewhere?). US Funders are expected to make use of this document when deciding whether a repository is adequate as part of a data management plan, or presumably when funding such repositories.

The body of the document is only seven pages, and lays out the desired characteristics with commendable clarity. Nothing is shocking in here, but there are some characteristics I’m particularly pleased to see included and that naive repositories will have some trouble with:

  • Retention policy
  • Risk Management, and for sensitive data, Breach response plans
  • Organizational and Technical sustainability
  • Unique persistant Identifiers
  • Curation and Quality Assurance
  • Provenance

The Turing Way - The Alan Turing Institute

If you’re setting up a data science/ML/AI group, or teaching students about those topics, this is a nice resource to have to hand. There’s guides on:

  • Reproducible Research
  • Project Design
  • Communication/Dissemination
  • Collaboration
  • Ethical Research, and
  • Maintaining a community

Research Computing Systems

Really cool Arm stories coming out this week - Timothy Prickett Morgan over at the Next Platform sketches out a possible Ampere roadmap for the next few years, Amazon’s first Graviton3 instances look amazing according to Michael Larabel at phoronix, and Microsoft is announcing a cute Arm-powered developer box and native Arm developer tools. Given the growing amount of remote-development offerings coming out (e.g. VS Code has a new dev container CLI), that “Volterra” box will be useful for developing even for non-Arm systems.

Obviously I have a conflict here, as NVIDIA will be selling its own Arm CPU soon, but I don’t think it’s partisan to be excited about the explosion of credible CPU options for research computing. It’s fantastic news for the diverse range of needs we have in our profession.


Google’s joining the Open Secure Software Foundation, working (with Synk) on some tools for ensuring software supply chain management with “assured packages” that undergo significant testing and quality control.


Emerging Technologies and Practices

Interesting update on Google’s TPUs that we talked about in #121 - the new TPUv4s will be in pods of 4,096 chips, with “dozens” of such pods available soon, most or all running on low-carbon energy.


Good interview between Tobias Mann at the Reg and Jim Pappas of Intel and the CXL foundation about what the upcoming Compute Express Link (CXL) is and its likely role in composable systems. They’re walking a fine line between expectation-setting and I think genuine excitement about what will be possible:

“Over this next year, the first round of systems are going to be used primarily for proof of concepts,” he said. “Let’s be honest, nobody’s going to take a new technology that’s never been tried.”

What I keep hearing is that CXL 1.0 and even 2.0 will be more like proving grounds and prototypes, while when CXL 3.0 systems start landing things will be getting interesting.


Random

This is pretty cool - at shell.duckdb.org you can do analytic queries any supported parquet or CSV file on GitHub or elsewhere on the web entirely on your browser. Webasm + embedded DBs for the win.

Relatedly, an interview about and architecture of Datasette, the sqlite-based query-a-dataset tool.

You likely all know this by now, but GitHub’s markdown now has math support with mathjax. (It’s not perfect! GitLab made some different choices which arguably work a little better).

Imagining an alternate history based on SAGE, the (military) Semi-Automatic Ground Environment, where team collaborative computing advanced further before the personal computer was born. It really is remarkable how some very sophisticated early approaches to using computers for collaborating on projects from the 50s-early 70s just vanished from memory.

IBM’s 1957 Fortran compiler implemented order-of-operations with only parenthesis and basically a sed script and wow that doesn’t look like it should work at all.

Convert JSON to CSV with jq.

Continue to love all these query-data-files-in-place-with-SQL tools - here’s sneller, for fast(!) SQL over JSON.

Making JSON more useful in SQLite with virtual columns.

Can’t find grid paper you quite like? Make your own with gridzzly.

3D graphics in the browser with WebGL… or css.

Good overview of colour schemes for scientific figures from the team at Northwestern.

Implementing a lock-free bounded concurrent-reader queue with only 32bits of additional state.

Build a proof of concept distributed Postgres.

Love spreadsheets? Love software from the 90s? Love Linux? You can now run Lotus 1-2-3 on Linux.

Too much quiet work time? Wish you could get a MacOS oriOS notification for every comment, issue, and PR on one of your GitHub repos? Trailer.app is here for you.

In a take that will infuriate most, the case for using tabs in some places and spaces in others.

A login-free and ephemeral docker image registry for, e.g., CI/CD so you don’t need to store credentials.

Hmm - log C function calls with Cosmopolitan Libc, which I hadn’t heard of before.

The case against Shapefile for geospatial vector data.


That’s it…

And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.

If you’ve just had or are having a long weekend, I hope you enjoy(ed) it! Either way, good luck in the coming week with your research computing team,

Jonathan

About This Newsletter

Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.

So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.

This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.