#128 - 2 Jul 2022

Bundling expertise into services; How to be an RCD recruiter; Bare-bones project management; Pair programming; Talk about Data Ethics Club; How to adopt an SRE practice

I heard a number of answers back about what sorts of service request processes people were using:

  • A lot of software development teams, of course, took bug reports and small feature enhancement request via GitHub/GitLab/Jira etc ticketing systems
  • Several groups are using Znuny, a fork of OTRS
  • Someone juggling multiple projects used a timetracker to make sure reasonable amounts of time were being spent on various external projects
  • A technical leader of an internal platform team said that it’s mostly “’support channel in Slack’ With complex issues moving to email” (but they had succeeded in at least getting the requests into the channel rather than DMs!)

One data science team had a longer answer that I want to quote here:

For a relatively small team of a few staff and about 10 students taking about 250 consultation requests a year one on one help on data science topics, not help tickets for a service), we use […] – basically a form + spreadsheet (google forms/sheet would work too). We have columns to track the status (incoming request, in progress, completed, etc.) and who has responded/been assigned, along with the info about the request. We don’t put other significant updates on the consults in this sheet – just for tracking status. We make sure our team brings the status of everything up to date at team meetings. Requests ping a slack channel when they come in, and we determine who will respond. So we can make sure that someone took each consult, even if the sheet doesn’t get updated immediately (it should be, but it doesn’t always happen).

For longer projects, we have only 6-12 or so per year – those go on a separate sheet for recordkeeping/reporting. All updates are communicated via team meetings or 1:1 meetings.

That same team is also working with other allied teams to figure out some kind of way to have a shared integrated history of interactions with researchers.

I found all your answers really useful! I think it’s an important topic, too - it ties in a bit to the last newsletter topic.

We’re in the expertise business, and there’s a spectrum of ways to bundle that expertise into services and products and make it available to researchers.

For researcher efforts that require cutting-edge expertise for an open problem it makes makes sense to expose that via longer, open-ended service engagements; breadth of experience makes sense to expose to the researcher via a consultation model with productized services (services, but a bit more structured, with a well defined process internally and well-defined scope externally); and efficient procedural work makes sense to bundle as at least semi-automated, cookie-cutter products.

The spectrum of ways of bundling expertise: Open-ended cutting edge engagements on the left, productized services consulting to take advantage of broad experience in the middle, and efficient provision of best-practices driven products on the right.

All else being equal, ideally a team will have a portfolio of ways to engage researchers along that spectrum. There’s a bunch of reasons:

  • Having a range of offerings means you can engage with different research groups where they are and based on their needs
  • With a range of expertise offerings, there’s growth opportunities in domain knowledge for individual team members
  • There’s a path for efficiency - initial novel engagements at the high-expertise end, if they keep coming up, can get turned into productized services, and then (maybe) products.
  • Diversification means there’s fewer single-points of funding failure that can happen.

I mentioned “exposing” the expertise to researcher clients a few ways; my mental model here is of an API. And that’s why I’m so interested in how teams handle incoming requests for work from researchers. These different services lend themselves to different “APIs”, to different kinds of work request models.

The sort of traditional request/ticket trackers work really well for very transactional requests, like for IT service requests or for Utility or product interactions. “Please reset my password”, “login5 is down”, “bug report: when I enter zero in the form I get the wrong answer”, “please update to the latest version of this dataset”. There’s not a lot of scoping or diagnosis or collaboration here. There’s an issue or a service request, a couple backs-and-forths, and it’s fixed. Metrics like how quickly tickets get addressed make sense - faster is better. These systems are time-tested and efficient approaches for these sorts of brief and somewhat shallow interactions.

But they’re kind of crummy for longer-term, more collaborative engagements! If in my personal life I want to have a conversation with an expert to help me with something - a lawyer or a doctor or a contractor to help work out a remodelling effort - I don’t file a ticket. I certainly don’t have metrics about ticket closure time - some problems are bigger than others and may kick off months long engagements, and that’s not obviously good or bad. The bigger engagement certainly produces documentation about the effort and the work product, but it’s not merely a series of back-and-forths on a web form or email - it’s much more distilled than that. Maybe if I don’t realize the question I thought was a simple request is actually Big Deal, I start the conversation in some sort of transactional system, but it eventually gets migrated out.

There’s no one-size-fits-all here. As with the team using slack that migrates to email, and the team using two different systems for two different engagement types, different work is just different and needs to be treated differently. I wish there were better tools for us to use! But if there’s any simple answers out there, I haven’t seen it yet.

Thanks to everyone who responded! Now on to the roundup.

Managing Teams

How to be a Recruiter for RCD: Part I and Part II - Lisa Lowe, L3 Scientific Applications Consulting

The time’s long passed when we could simply post the generic HR job req for our openings on the institution’s career website and wait for the applications to roll in. I used to say “That may work for postdocs and faculty positions…”, but it increasingly doesn’t even work for postdoc openings.

These two articles from new newsletter community member Lowe lays out the work that needs to be done to get good candidates into the application process these days:

  • You need to be a recruiter
  • Identify your target audience and figure out how to reach them
  • Send targeted, customized

Yes, that’s a lot of work, but that’s what it takes:

I recently learned that in the space of 14 months, our management found only a single candidate who they felt was qualified enough to interview. After I did a recruiting blitz, they got 6 in the space of 5 days.

In part I, Lowe gives a worked example of looking for a junior HPC systems engineer. She targeted two groups of IT professionals - those who have recently gone back to school for training, and those looking to switch career paths. Then she specifically reached out to career services and/or departments at nearby institutions for the back to school crowd, and IT affiliation groups for the career change market.

In Part II, Lowe shares the case of a more specialized mid- to senior-level role, where now you have to work a little harder:

  • Keep an eye on other jobs, so you know what else is out there (and it has other benefits, too!)
  • Potentially contact individuals now, with personal mesages
  • Keep an eye out for contractors
  • Be more intentional about identifying and communicating the strong points and uniquenesses of your org, particularly for those individuals.

How Do Individual Contributors Get Stuck? A Primer - Camille Fournier

Fournier lists some common places for individual contributors to get stuck - such as when they need to:

  • Finish the last 10–20% of a project
  • Figure out where to start: Starting something from scratch, or working with unfamiliar code or systems or teams
  • Do a different kind of work like project planning or navigating the organization or coordinating with external partners
  • Ask for help
  • Deal with surprises or unexpected setbacks
  • Pull the trigger and going into prod
  • Say no

And that they can sidetrack themselves by:

  • Brainstorming/architecture
  • Researching possible solutions forever
  • Refactoring
  • Helping other people instead of doing their assigned tasks
  • Jumping on fires even when not on-call
  • Working on side projects instead of the main project

The problem with Fighting Fires - Ed Batista

It’s worth hearing this simple message from time to time. Fighting fires feels great and important - it’s very satisfying! But as managers and leaders our job is to coordinate firefighting and - more importantly - fire prevention efforts. Constant firefighting is, too often, a symptom of valuing activity over effectiveness.


Technical Leadership

Learn the weekly rituals you should master as a software project manager - Jade Rubick

Project management, like people management, is something we’re typically thrust into without any training. If we came from research, we have some experience of managing research projects, which is good - but there the timescale is longer and no one is depending on our results month-to-month. RCD projects are different. Linking to this post from elsewhere, Rubick writes something that squares with my experience with RCD teams:

I see most engineering managers gravitate to two extremes with project management: either they don’t do much at all, or they go all-in on traditional project management approaches.

He outlines pragmatic, aggressively simple, approach to project management of the kinds we’re usually involved in; the context is in software development but it applies more broadly:

  • Identify the next milestone - go from milestone to milestone, the next never any further than a month off
  • Update your project plan - weekly, spend an hour or so doing this: not in any fancy tool, just a document
  • Update your risk registry - again, not in any fancy tool
  • Send a weekly project update - a clear update to those involved in the project.

He argues that this should only take a couple hours a week for the kinds of projects we’re most likely leading. (Obviously we might be involved in more elaborate projects - construction of a new facility - but we’re probably not running those!). Once that’s done there will be other things to do to coordinate the actual work that the plan’s for. But higher-level planning, Rubick argues, doesn’t have to involve huge tools and long documents. Project planning is to support the work and the people doing or relying on it, it’s not an end in itself.


Managing Your Own Career

A Conversation with Mathematical Consultant John D. Cook - Krešimir Josić, SIAM News

Going solo is always an option for people with highly specialized skills. Josić interviews the famously prolific Cook, whose blog posts or tweets you’ve almost certainly read, about his experience starting his own consultancy.


Product Management and Working with Research Communities

Huge congratulations to Harvard for going big - 15 open positions as part of the creation of a University Research Computing Office, growing FASRC (the faculty of arts & sciences HPC centre) while adding data management and software development as part of a University-wide portfolio.


Research Software Development

Pair Programming - James Cross

Pair programming is one of those things where it’s not necessarily hard, but it’s easy to do wrong. Like everything, in our role, having clear shared expectations beforehand makes the difference.

The short list of recommendations here are to have two people each of one defined roles (driver and navigator), set time limits at the beginning, switch periodically, and still do code review at the end - the argument is that the synchronous nature of pair programming still can lead to groupthink and so could use external review.

On the other hand, we’re told not to worry too much about who to match with who - sure, there’s advantages to having a more experienced mentor a junior, but there’s advantages to pairing peers, too, and having the juniors navigate is a useful experience.


RSE Group Informational Interview Template - USRSE

Thinking of starting a research software development group in your institution? Know of some others that already exist in the same kind of environment? This Google Doc is an outline of some questions to ask in an informational interview, to find out how they’re organized, what’s worked, what hasn’t, how funding works, and more.


Research Data Management and Analysis

Data Ethics Club: Creating a collaborative space to discuss data ethics - Di Cara et al, Cell Patterns, 100537

Data ethics is a very topical and interdisciplinary area, and a way to engage a group of potential collaborators around data data issues and researchers. The authors here describe the success of a data science topical discussion group at their effort:

Data Ethics Club is a fortnightly reading and discussion group held virtually that is currently hosted by University of Bristol staff and students. The hour-long lunchtime meeting is free to attend and open to everyone.

Discussions are organized and structured and recorded via a github repository (which has a lot of great material there already).


Research Computing Systems

More accuracy with less precision - Lang et al, Quarterly Journal of the Royal Meteorological Society, 146:4358-4370

This paper was from October, and the announcement was actually made last May but somehow I missed it. It thus won’t be a surprise to people who follow weather and climate simulations more than I do, but maybe others will be as startled as I was:

Reducing the numerical precision of the forecast model of the Integrated Forecasting System (IFS) of the European Centre for Medium-Range Weather Forecasts (ECMWF) from double to single precision results in significant computational savings without negatively affecting forecast accuracy […] ECMWF’s ensemble and deterministic forecasts will run operationally at single precision from IFS model cycle 47R2 onwards.

The imbalance between memory-bandwidth and compute-power available in research computing systems has been growing worse and worse over time. More and more groups are trying to figure out how they can use less memory, or at least bandwidth, in their computations. Reducing the precision of variables is one way to do this. We’ve seen reduced precision in AI, of course, and there’s growing interest in mixed-precision methods (here’s a recent review for linear algebra and another for other methods).

But I hadn’t realized that fairly staid ECMWF simulator, used for research but also for production weather forecasts, was running in production in single precision. What’s more, they used the computational and memory savings of switching from FP64 to FP32 to increase the vertical resolution, which lead to an increase in accuracy of the model (and makes comparison to medium- and extended- range forecasts easier).

Unless there are dramatic advances in memory bandwidth - either with memory technology or putting more computation closer to the memory - we’re going to see more of this. It’ll require a lot of interesting algorithmic work and code changes! But the benefits are pretty clear.


Filtering numbers quickly with SVE on Amazon Graviton 3 processors - Daniel Lemire

Wouldn’t it be awesome if we didn’t have to recompile code every time the next generation of processor had an incrementally better or bigger vector operations? Many new and upcoming generation Arm chips (yes, NVIDIA will have one) will have an old vector-processor inspired “scalable vector extensions” (SVE) instructions. As Lemire says,

What is unique about SVE is that you work with vectors of values, but without knowing specifically how long the vectors are. This is in contrast with conventional SIMD instructions (ARM NEON, x64 SSE, AVX) where the size of the vector is hardcoded.

RISC-V has something similar and I couldn’t be happier. I love the “back to the future” aspect of it, and hope that it eventually makes maintaining code across architectures easier as more processors adopt something related.


A look inside our sixth generation of server hardware - Eric Shobe and Jared Mednick, Dropbox

A quick overview of what Dropbox’s next generation of storage systems looks like - 20 PB/rack, with 100 drives per chassis with a single (!) 100 Gb NIC.


Emerging Technologies and Practices

How to Adopt an SRE Practice (When You’re not Google) - Jemiah Sius On

On tells us you don’t have to be Google-sized to adopt some Site Reliability (SRE) practices on your team. The main thing is have to have clear service level expectations, understand where the risks come from of not meeting them, and someone whose responsibility it is to guide the team towards meeting those expectations.


Random

Setting up a BGP Autonomous System.

Sketching methods are becoming widely used in large scientific data science, led in no small part by bioinformatics. Here’s a good overview of doing a min-hash similarity join - the code is in Scala for Spark, but the explanations are very clear and apply more broadly.

Join and Index (!?) with jq.

Dicts considered harmful? For data exchange, anyway.

I didn’t realize there was a publicly available, petabyte-scale web crawling dataset freely available - Common Crawl.

The challenges of running and steering Atari.

Yet another hopeful SQL replacement that’s more of a proper programming language: PRQL. I’d love to see one of these succeed someday, and yet here we are in 2022 with GitHub littered with the corpses of SQL replacements.

Rsync is a sadly under-appreciated tool. Here’s how it works.

An opinionated python testing style guide.

Why not just use an ad-hoc dewey-decimal type system to organize all your stuff? Johnny Decimal.

I love that embedded databases are now so commonly used and increasingly respected that there are articles like “SQLite or PostgreSQL? It’s Complicated!

An introduction to looking at assembly code with gdb. If you’re interested in playing with these sorts of things, I can not recommend Godbolt’s Compiler Explorer highly enough.


That’s it…

And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.

Have a great weekend, and good luck in the coming week with your research computing team,

Jonathan

About This Newsletter

Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.

So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.

This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.