#95 - 8 Oct 2021

Let me tell you about a mini-fiasco this week that was entirely my own doing.

In our team, we routinely hire students for semester-long co-op positions. It happens three times a year - I think we’ve taken part 12 times over the past five years. It generally works out pretty well, for us and the co-op student.

The process is pretty uneventful generally. Our tireless administrative staff, without whom the place would fall apart, lets me know that it’s time again; we post our usual job ad; we interview some students and submit a ranked list. We’ve lately been pretty good at having projects ready for them on day one.

We had a couple more potential student supervisors with projects this semester, which is good. In the past year we’ve been upping our game at hiring full time staff, and part of that is better job ads; so we wrote a much better job ad for the co-op position this year and that resulted in fewer candidates but who were overall much better matches for the team. Win-win!

And, my fellow managers and leads, I’m ashamed to tell you that that’s where it all fell apart.

Because we had fewer, better matches, and more potential supervisors, triaging the resumes - which was my job - was harder and I didn’t do it in time. Our admin, after an earlier prompt, let me know the last possible day to interview was the next day. I dumped a pile of resumes at end of day on our volunteer supervisors, who had hours to read them after hours and choose preferred candidates. Our interviews were ill-prepared, and we didn’t have as much communication with the candidates before and afterwards as we would have liked. A bunch of meetings had to be cancelled and rescheduled because there was only one window to interview the students. The remainder of the week was all messed up.

The outcome was ok, but not great - we found some really good candidates, but we shredded credibility with them because we were visibly unprepared. One has already been poached by someone in another team in our org, and if we had coordinated earlier we could have known about the shared interest. And it completely screwed up our week and stressed out volunteer supervisors - who don’t have to do this next time around.

So why are dates of something that happens on a schedule three times a year and has done so for the past five years coming as a surprise to me and the co-op interviewers? Why was I being a bottleneck for triaging resumes - why not have the supervisors do it? (Because that’s how I did it four years ago when I was the only supervisor). Why are we making our admin, who has other things to do, patiently shepherd us through an utterly routine occurrence that we should have down pat?

More fundamentally - the co-op student hiring process is both something we do all the time, and is the fastest feedback loop we have for improving our hiring process. Why, after 12 iterations, did I not have a runbook for doing this, with a systematic way of learning and improving both the co-op process and our interviewing process for hiring in general?

The answer of course is that it was going well enough, there were no flashing red lights, and there was always something else demanding attention.

But building a process for this at any time over the past four years would have been a very valuable activity for me. It would have helped me delegate tasks that I oughtn’t have been doing, improved our team’s interviewing and hiring skills, helped us find better matches for co-op students, and improved our co-ops experience throughout the process (which helps with referrals and full-time hires).

The situation is utterly goofy. Had I been talking to a peer and they told me they had this issue, I would have encouraged them to address it. And, in my own job - I just didn’t. Even when we know the right thing to do as managers to improve our processes and help our team, it’s really easy to be caught up in other priorities - “now’s just not the right time”.

Anyway: a learning opportunity. Next time through there will be a runbook, and it’ll go better. But this shouldn’t have happened in the first place.

Do you have processes in your current job that badly need updating - or have you improved one lately? How’s it going? Email me and let me know.

On to the roundup!

Managing Teams

LifeLabs Sample Interviewing Playbook - Life Labs Learning

Life Labs Learning, a management/leadership training company, has a Google Doc template interviewing playbook for team members that can be a starting point for a given team and role. It’s short but gives good guidance on creating the process for the manager, and how to conduct the interview for novice interviewers.

No-bullshit tenets for faster decision-making - Jade Rubick Breaking Through The Guard Rails - Roy Rapoport

These are about managing and decision-making generally, but they apply very much to hiring.

The key idea of both is that, while people often list the things they value in their team - “independence”, “delivers results”, “attention to detail” - listing them as standalone values is meaningless. Nobody thinks independence, delivering results, or attention to detail are bad things, so saying you value them communicates nothing and can’t inform decisions.

But these values have tradeoffs. They’re one end of a spectrum that has another end. For these to be principled choices rather than just feel-good assertions, you have to explicitly own that if you’re hiring for “independence” you’re also hiring for “chafes at detailed guidance”.

Rubick recommends specific “We value this, over this” structures to clarify team decision making, regardless of the context:

A good tenet should have a perfectly valid opposite tenet that would make sense in a different context.

He gives some examples:

“We build things in a cost-conscious way, even if it takes longer to build”. Vs. “We are willing to throw money at problems if it speeds us up”

“We value clarity over moving quickly” Vs. “We value taking action over analysis”.

He doesn’t suggest these should be top-down, but should emerge from a forming team.

Rapoport’s article is more about personal failure modes. He describes an analogy where different people might prefer one side or another of a lane, along multiple dimensions:

Freedom vs guidance
Strategy vs tactics
Relationship orientation vs execution orientation
Changephilic vs changephobic
Caution vs speed <br/.>

and knowing where you personally sit on those spectra mean you can be more prepared for your own personal failure modes. He leans heavily on the “freedom” side of freedom vs guidance as a manager, so he’s not worried about becoming a micromanager - he doesn’t prepare for that failure mode - but he’s learned to keep an eye on himself to make sure he’s providing enough guidance for team members who need it in a particular situation.

Again, note that the lane edges are not defined as “good vs bad” - freedom is good, but so is guidance. Caution and speed are both good. But there are trade-offs.

Are your hiring filters working? - Jonathan Hall

Once you are clear on the nature of candidates you’re trying to hire, it’s a lot easier to evaluate hiring processes - and once you have a documented repeatable process, you can evaluate and update iteratively.

As Hall points out, everything we do in the process - from the job ad to the application form questions and onward - is a hiring filter. But is it a good filter? Does it enrich for the candidates we actually want by filtering out the candidates that wouldn’t be a good match and preserving the candidates that would be a good match? Have you checked?

Product Management and Working with Research Communities

FILE NOT FOUND: A generation that grew up with Google is forcing professors to rethink their lesson plans - Monica Chin, The Verge

This article has been going around, and I’m curious to hear how many groups that do training have seen this already - students who grew up with search capabilities and (implicitly) object stores or flat namespaces are going into computational science courses with no comprehension of directory hierarchies, or that programs need to be told where to find files.

I think this is an interesting challenge when designing training for our existing tools, which very much rely on file system hierarchies. Yet I can’t agree with the premise of some of the scientists that Chin interviews, that directory structures are “computer fundamentals” that need to be learned, as opposed to a particular way of organizing metadata about files that many of the tools we’re still using require.

Linux block file systems (say) support directories, but files are just inodes and blocks, with human-friendly grouping and naming implemented by having some files be lists of other files and directories along with human-readable names. Object stores make this more explicit - “directory structures” are just syntactic sugar.

We know that a strict hierarchy can be awkward in practice for organizing files - sometimes files belong in multiple directory trees - so from very early on with POSIX file systems we had to give the directory structure mental model an escape hatch, with symlinks. Even when happily using directory structures, search with find and grep (and git grep and cousins) are pretty widely used tools. And heaven help you if you’ve realized you organized files into directories along the wrong dimensions, because “refactoring” a directory structure is a labour-intensive and error-prone process. An awful lot of data management systems are moving towards something that looks more like search + flat-ish name spaces that students are used to.

Is this mental-model mismatch something your group has seen in training sessions? How has your team handled it? Does anyone still organize their email into folders? Reply and let me know.

Teaching by filling in knowledge gaps - Julia Evans

Chin’s article above is really valuable because it connects a problem people are having (understanding directory structures and why they even exist) to their existing knowledge and experience (familiarity with accessing data and files primarily via search and flat namespaces). There’s a knowledge and experience gap that needs to be filled.

As Evans points out in this article, making progress with basic knowledge and then filling in the gaps as needed is how most of us learn computing (or physics, or..). So paying attention to what they do know is as important as what they don’t, because then you can connect to that knowledge:

One mistake I see people make a lot when they notice someone has a knowledge gap is: <br/.> -notice that someone is missing a fundamental concept X (like knowing how binary works or something)
-assume “wow, you don’t know how binary works, you must not know ANYTHING”
-give an explanation that assumes the person basically doesn’t know how to program at all
-waste a bunch of their time explaining things they already know

Evans says that she herself focusses on small bitesized things that uncover one core concept - often what’s behind an abstraction - with realistic use cases of how someone might have run up against this, and so what they likely already have some familiarity with.

I’m not sure if that approach suggests any particular way to approach teaching directory structures. Show how indexes like locatedb work, how a search for a filename goes - especially for a case with a non-unique filename, and then talk about directory hierarchies?

Cool Research Computing Projects

Nobel Prize for Climate Modeling - Steve Easterbrook

This year’s Nobel Prize in Physics went, in part, to the kind of work that RCT teams support - climate modelling. This is a section from Easterbrook’s upcoming book describing Syukuro Manabe’s early work on general circulation modelling.

Research Software Development

Profilerpedia: A map of the Software Profiling Ecosystem - Mark Hansen

Hansen presents a tabulation of 130+ (!!) different software performance profilers which collectively output in 109 (!!) different file formats, and 90 different tools for visualization of profiles. There are also ~100 different converters available.

Hansen is hoping to (a) let people wrestling with profilers or visualizers that there might be other options out there for them - at the very least other visualization tools they should know about - and hoping to help kickstart development of more profile-format converters.

How Big Tech Runs Tech Projects and the Curious Absence of Scrum - Gergely Orosz

I’m a big fan of the analogy between research software development and tech companies (including startups). In both cases, there’s an element of something new to what’s being built, and R&D - figuring out the problem while building an initial solution.

Another parallel is that you can’t necessarily start with two-week stories. An MVP for the new groundwater simulation code is going to take a little while, and there’s going to be a lot of very waterfall-y planning at the start.

Orosz has consulted and worked at several companies, and ran a survey of 100 companies. He sees big tech, and venture-funded scaleups and startups often having a Plan → Iteratively Build → Ship methodology rather than a purely agile approach like Scrum or Kanban. (Also, everyone hates Jira, and everyone uses it anyway).

There’s a lot more interesting material in his distillation of his survey - eventually as tech firms get large they adopt product rather than project managers, the importance of team autonomy in choosing how they work to their morale (even though they tend to converge on a common model per sector!) the varying division of responsibilities of managers. If you’re interested in how teams build stuff at tech companies, it’s worth a read.

Research Data Management and Analysis

Why you should use Parquet files with Pandas - Tirthajyoti Sarka

Regular readers will know that this newsletter is adamantly anti-CSV. Using text files to store structured data is a problem from the point of view of documentation, integrity/enforcing constraints, performance, space usage, and usability.

The usual recommendation is databases, including embedded DBs like SQLite or DuckDB. But! If you really want to maintain the a-table-in-a-file structure, especially where you’d be typically operating mostly on subsets of columns at a time, Sarka would like to introduce you to Parquet.

Parquet files are binary self-describing columnar files that have been used for years in the Big Data/Hadoop area, and are easily accessed in python (with pyarrow) or R (with arrow). The data is stored by column, not by row, which makes reading subsets of columns fast, and greatly improves compression. In addition, several columnar databases (including DuckDB) will operate on parquet files out-of-the-box.

Even if you’re just going to read in the entire file, Sarka shows how reading it into a Pandas data frame is 2-10x bigger for 10-100MB files:

Sarkar shows reading from a CSV can be 2-10x slower than reading from a Parquet file, a factor that generally grows with size.

Research Computing Systems

Helping Apache Servers stay safe from zero-day path traversal attacks (CVE-2021-41773) - Michael Tremante, Cloudflare Blog

Heads up that Apache 2.4.49 released mid-Sept is susceptible to a remote code execution attack that’s being seen in the wild, if you’re using that you should downgrade to earlier versions or upgrade quickly to 2.4.51.

Sharing the wealth of HPC-driven apps with flexible as-a-service models - Dell, HPC Wire

This is a sponsored vendor article - so, you know - but it highlights UMich, University of Florida’s, and SDSC’s successful service-driven HPC models: Michigan’s Open OnDemand and ITS Cloud Services program, U Florida’s similar Open OnDemand program, and SDSC’s cloud based version. Building these service-based approaches to compute provisioning is not just doable but done, and there are starting to be enough use cases to learn from.

Path Building vs Path Verifying: The Chain of Pain - Ryan Sleevi

Just as last week’s Facebook outage taught a lot of people about BGP, certificate issues are the way most people learn about certificate chains and verification. This year’s Let’s Encrypt old root certificate expiry went better than Facebook’s BGP issues, but there were still a lot of problems. Here’s an article from last year where Sleevi describes a similar issue, what went wrong with early OpenSSL releases, how certificate graphs should be handled, and what happens when corners are cut.

Emerging Technologies and Practices

The ultimate guide to GitHub Actions authentication - Michael Heap

GitHub Actions are increasingly being used by research computing teams for CI, CD, bundling releases, and other things (have a particularly interesting use case? Let me know!).

Here Heap walks us through the different authentication and authorization methods, and their pros and cons. His proposed ranking is:

GITHUB_TOKEN - the default, but you probably should know how it works and downscope permissions, and understand its limitations
GitHub Apps - Has some additional features like lets you perform actions as a user, abd actions can trigger new workflow runs
Personal Access Token (PAT) - very flexible, but has long lived tokens and probably means the use of shared accounts.

Calls for Submissions

Open Call for New User Project Proposals, Advanced Quantum Testbed, Letters of Intent due 5 Nov

From the call:

The Advanced Quantum Testbed at Lawrence Berkeley National Laboratory (Berkeley Lab) announces the second open call since the inauguration of its user program for new user project proposals. In collaboration with AQT’s expert team, testbed users have full access to the hardware and software, participate in its evolution, and advance the science enabled by quantum computing.

The 22nd International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT 2021) - 17-19 Dec, Guangzhu China, papers due 31 Oct

Topics of interest include

Networking and Architectures
Software Systems and Technologies
Algorithms and Applications
Security and Privacy

Events: Conferences, Training

The 34th International Workshop on Languages and Compilers for Parallel Computing - 13-14 Oct, Virtual, $50 Registration

Talks include progress on runtime and programming languages for parallel computing, accelerated architectures (including OpenMP for FPGAs), multi-target compilation including ARM and RISC-V, and more.

International RSE Day, 14 Oct, various

Happy RSE day, to all who celebrate!

The international council of RSE associations has apparently decided to celebrate Interational Research Software Engineer day the second Thursday of October each year. This page lists events in the Netherlands, Nordic countries, the US, Germany, and Australia.

19th Annual Workshop on Charm++ and its Applications - 18-19 Oct, Virtual

Free virtual workshop on all things Charm++, NAMD, ChaNGa, and OpenAtom.

ARDC Skills Impact and Strategy - 26 Oct - Free ARDC Carpentry Connect Australia, 2 Nov - Free

Two sessions hosted by the Australian Research Data Commons for those interested in training. The first is a set of sessions on measuring training outcomes (notoriously difficult to do well!). The second covers hosting Carpentries events, for those in the Carpentries community or considering joining.

18th Annual IFIP International Conference on Network and Parallel Computing (IFIP NPC) - 3-5 Nov, Paris, €50 virtual attendees, €450 on-site

Everything from data centre networks and NVM storage systems to cache management for SSDs.

Random

As we get closer to hallowe’en and the terrors that holds - the case for proportional fonts for programming. As if the thought isn’t horrifying enough, the author says that using proportional fonts meant switching from spaces to tabs. Truly, it is a season of damned souls.

Semantic code search for C/C++ code bases - weggli.

Updates to GitHub Releases, including the option of autogenerating a first draft of release notes.

AWS Batch do’s and don’ts, including cutting costs, debugging, making sure your images & containers aren’t a bottleneck, and when to use various tools.

Cindy Sridharan’s delayed best of 2020 tech talks, including on some RCT-relevant topics like monitoring and change data capture.

Web apps in Pascal, for some reason.

Fit 6000 tiny bit-serial cores onto a single FPGA, for some reason.

A git implementation in awk, for some reason.

Interesting review of a new terminal emulator, warp, which doesn’t seem like it’s for me but it’s fun to see some quite different UI and design choices made around what’s become a pretty staid space.

Post-quantum cryptography is already available for curl.

Another small/medium-sized academic/public sector HPC procurement goes to Azure - this time University of Bath, after a pilot project last summer.

The Insane Innovation of TI Calculator Hobbyists - what can be done with a 10MHz revamped 8080 CPU and 32K RAM.

A searchable database of (so far) 1,818 public incident reports/statements.

That’s it…

And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.

Have a great weekend, and good luck in the coming week with your research computing team,

Jonathan

Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.

So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.

This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.

Jobs Leading Research Computing Teams

This week’s new-listing highlights are below; the full listing of 151 jobs is, as ever, available on the job board.

RCT Newsletter

#95 - 8 Oct 2021

The benefits of process; Hiring & Interviewing; Values, Tenets, & Guardrails; Filling in gaps: Students unfamiliar with directories; Profilers; Whither scrum in tech; Parquet for Pandas; GitHub Actions AuthN

Managing Teams

Product Management and Working with Research Communities

Cool Research Computing Projects

Research Software Development

Research Data Management and Analysis

Research Computing Systems

Emerging Technologies and Practices

Calls for Submissions

Events: Conferences, Training

Random

That’s it…

Jobs Leading Research Computing Teams

#95 - 8 Oct 2021

The benefits of process; Hiring & Interviewing; Values, Tenets, & Guardrails; Filling in gaps: Students unfamiliar with directories; Profilers; Whither scrum in tech; Parquet for Pandas; GitHub Actions AuthN

Managing Teams

Product Management and Working with Research Communities

Cool Research Computing Projects

Research Software Development

Research Data Management and Analysis

Research Computing Systems

Emerging Technologies and Practices

Calls for Submissions

Events: Conferences, Training

Random

That’s it…

About This Newsletter

Jobs Leading Research Computing Teams