#97 - 22 Oct 2021

Giving interview quetions in advance; How new managers fail their team; Looking for leadership jobs; Journal for digital humanities software; Adding type hints to a big code base; Jupyter notebooks and reproducibility; Data managers at Janssen

Hi!

So, two final aspects of our recent hiring situation that I haven’t had room to mention earlier: we’re giving out interview questions ahead of time, and interacting with peer teams is still hard in this hybrid/remote world.

The first of the two is likely more surprising and/or controversial. We’ve changed a bit how we’re hiring, including sending out some key interview questions ahead of time. This was initiated by a team member, interviewing co-op students. I was initially pretty skeptical, until I attended the resulting interviews; the discussions were so much better, and went so much deeper, that I wanted to keep trying it. Our initial attempts with interns went well enough that we’ve hired our first contract staff member using this approach too (and sent a pretty detailed candidate packet).

This seems so far like it works particularly well for behavioural type questions, or questions of a similar type (“Tell me about your ‘favourite’ bug you found and had to fix”, etc). By them knowing the questions that are coming they’ve had time to come up with good initial answers and have them fresh in mind; then, the followup questions - where the real work of the interview is done anyway - comes much faster and so can go much deeper in the same amount of time. It also feels (on both sides, seemingly) much more like a conversation than an interrogation.

We’ve talked in the newsletter about how useful it is to be explicit ahead of time with the candidate about the interviewing and hiring process (e.g. #84, talking about candidate packets); now the discussion is, for each bit of the process, given what we want to accomplish, how detailed should the explicit information be. Obviously there are some kinds of technical assessment that can’t be done with full up-front disclosure, but even there we can be more detailed than we’ve been in the past.

We’re still early on in this process, especially for permanent staff, but the results seem to be really promising for interns; they are so far succeeding at least as well as those hired the old way, the acceptance rate appears to be markedly improved, and everyone seems happier with the interview process (although of course candidates - maybe even especially successful candidates - are unlikely to tell us our process is terrible). Doubtless we’ll find situations where it doesn’t work; I suspect in particular that for leadership positions where the responsibilities are more wide-ranging and assessment criteria are less clear, doing this would be difficult.

Second, while we’ve pretty much got our own internal team communications down in this new remote/hybrid world, interacting with peer teams is still an unsolved problem for us. We were caught off guard by one of our peer teams wanting the same student - this would have come up much earlier when we were all sharing a work space.

I think in general, most successful teams have got the hang of remote and hybrid communications now in situations where (a) people work closely together - within teams, or between very closely collaborating teams; and (b) people interact seldom - external collaborators, or resources at other institutions, where “reaching out” always took a fair bit of effort.

What seems like an unsolved problem for us is the in between state - people who would interact occasionally and with relatively little effort going into it. Peer teams working nearby; groups giving colloquia that we occasionally attend. For the peer teams we’ve tried (unsuccessfully) to set up some internal events to help maintain the communication flows, and that hasn’t worked so far. I could expand my peer one-on-ones, but that would make the managers a bottleneck (and often the right topics wouldn’t come up). We can’t all just chatter on the same slack - that’d become an unwieldy number of people. Something like an all-of-organization donut kind of thing could work, but it seems like a big ask of team members when most of the time nothing would come of it.

What do you think - have you tried sharing interview questions or other aspects of interviews ahead of time, or are concerned that it’s a terrible idea? Have you found something that does work for replacing what used to be occasional interactions in the workplace? Let me know, and if you give permission I’ll share with the readers.

But now that’s the last, I promise, of stuff about our co-op hiring mini fiasco. On to the roundup!

Managing Teams

How New Managers Fail Individual Contributors - Camille Fournier

Fournier has coached a lot of managers, and she shares some common failure modes of new managers:

  1. Doing all the technical design work yourself
  2. Doing all the project management yourself - your team members will need to learn those skills as they advance
  3. Neglecting to give feedback
  4. Hoarding information - intentionally or unintentionally
  5. Focussing on the their own output and not that of the team

How to write (actually) good job descriptions - Aline Lerner

In an incredibly tech hot job market, it’s hard to even attract the attention of candidates who would be amazing additions to our team. That can be helped with active recruiting; your other tool is your job ad. Lerner reminds us about writing good job ads:

  • Focus on attracting good matches
  • Have an “about us” section
  • Have an “About you & what you’ll do here” section - including specific and memorable descriptions of what they’ll do.

The audience paradox - in a hot job market it’s hard to even attract good candidates but there are lots of mis-matched candidates.  Lerner argues it’s best to focus on attracting the good candidates


Managing Your Own Career

How to find engineering leadership roles - Will Larson

Larson describes how to look for lead-of-lead type roles (20+ computing staff) in his world (tech industry), but I think it applies in our world, too:

  • Hearing from peers - part of what I want to do starting with this newsletter is create a community of practice of RCT managers where this can more readily happen
  • From listings - Ditto! - although Larson suggests here trying to find a referral of some sort rather than just a cold apply
  • Search firms - pretty unlikely in our line of work
  • Crowd-sourced searches - basicaly hearing from a broader range of peers
  • Sharing that you’re looking online - which obviously has downsides

Product Management and Working with Research Communities

Construction Kit: a review journal for research tools and data services in the humanities - Construction Kit Editorial Team

This is cool - a peer-reviewed, open-access journal specifically for research tools and data services in the humanities:

CKIT understands computational work as an essential infrastructural and intellectual part of humanities research within and outside of the digital humanities field.

They’re actively looking for papers:

The editors of CKIT invite authors to submit a review of a research tool or data service in the Humanities. Reviews should have a length of 1500 to 4000 words and can be written in English or German. Each review should address the underlying concept of the research tool or data service, the domain specific research questions it addresses, and the engineering practices it implements.


The Actual Next Million Cloud Customers - Corey Quinn

This is about cloud and AWS, but it’s really about early vs middle adopters of technological solutions, and how organizations can best serve the median potential client. I think there’s a lesson for Research Computing and Data (RCD) teams, too.

AWS famously has approximately four kazillion services that act as building blocks for - well, almost everything. And for tech-savvy customers - the first million cloud customers - that’s exactly what they want! But to start making inroads into the next million - and the million after that, and… they’ll need something different. And that would be a big change:

The entire AWS sales and technical field teams would need to learn to have very different conversations with customers. But consider how unworkable the alternative is. If today’s sales and marketing motions continue doing what they’re doing, customers […] growing sense that AWS is and remains an “infrastructure company” instead of a company that builds services that empower its customers but can’t articulate how it might do so.

As RCD teams start trying to make inroads into broader ranges of research and research groups - something that is absolutely essential if RCD is to unleash the full potential of research - there’s going to have to be a decreasing focus on “selling” bytes and flops and disk space and undifferentiated “software development”, and more of a focus on services that make sense and are obviously valuable to a non-big-compute-savvy wet-lab biologist or historian or pure mathematician. And what services make sense and are obviously valuable will be very different for those three folks.


Research Software Development

Tests aren’t enough: Case study after adding type hints to urllib3 - Seth Larson

I love a good migration story, and this is a pretty big one, both in terms of change size (eventually touching every function in the code base) and package importance (about 5,500 packages - and like 2/3 of a million repositories - depend on urllib3).

Python has a lot of drawbacks, as everything in computing does, but one thing I really like about it for research software development is that it enables an incremental path to maturity. Python type hints allow you to put type annotations in code starting with Python 3.6, and a lot of tooling (IDE and static checkers like mypy) can then highlight problems for you.

In this article, Larson describes the process, and leads with why it was worth it (“hundreds of engineer hours across several months”) - even with 1800 tests and 100% test coverage of lines of code, this process still identified code quality issues and bugs they hadn’t been aware of.

The scope of the effort made it a little challenging. It couldn’t be done all at once, but doing it incrementally was challenging, given how the typing (or lack thereof) “spreads” across files when they’re imported into others, etc. Their approach involved:

  • Keeping a list of known-annotated files, running mypy on them, and filtering out errors from files not on the list
  • Developing conventions about temporarily disabling warnings on some functions, with comments as to why (because the type mismatches get flagged with too little context to see immediately the reason) and being explicit about which errors are being ignored, both for tighter checking and as a form of documentation
  • Adding typing to the test suite, which in principle isn’t necessary for correctness of the library, but let them identify issues with typing
  • Resist the easy-out of “typing.Any”

TIL: 1.4 Million Jupyter Notebooks - Vincent Warmerdam
A Large-scale Study about Quality and Reproducibilty of Jupyter Notebooks - João Felipe Pimentel, Leonardo Murta, Vanessa Braganholo, and Juliana Freire

Researchers and analysts need something like notebooks, and notebooks are by and large not super great for software engineering.

Warmerdam gives an overview of the paper by Pimentel et al., describing their massive effort of downloading 1,400,000 Jupyter notebooks from GitHub and trying to run them - only 24.1% were able to run at all, and only 4% generated the same results. Quoting Wamerdam quoting the paper:

The most common causes of failures were related to missing dependencies, the presence of hidden states and out-of-order executions, and data accessibility.

The paper itself also gives a number of suggestions for improving the quality of published notebooks.

I’m not really sure what to do about this. RStudio manages to do a nice job of having a really accessible interactive environment, and publish notebooks, while also having really strong off-ramps to getting code into libraries and under version control outside. Julia’s Pluto notebooks are reactive and thus at least avoid out-of-order execution. But I don’t know if there’s any set of small changes to something like Jupyter that can improve matters.


Research Data Management and Analysis

Janssen’s Data Ecosystem and the Role of Data Managers - Allison Proffitt, BioIT World

Maintaining an RCT leader job board gives one a good view on the state of the research computing team job market. I’ve written before, in a post that would be worth updating, that research data management has never been more employable - firms, especially but not exclusively in regulated areas like biomedical and finance, are hiring data management teams or even staffing entire data management organizations.

In this article, Proffitt writes about pharmaceutical company Janssen and the role of data managers in their R&D organization:

Janssen used to be a highly decentralized organization, Stojmirovic explained, with research data acquisition, storage, and analysis all centered on therapeutic area. The approach, he said, led to missing data management workflows, a variety of metadata schema, and inconsistent data curation and storage. As a result, data were not traceable up or downstream of an individual team and there was no way to search or compare data across therapeutic areas.

After years of this, they’ve created a central data ecosystem which has become key to the entire R&D organization, with data managers (along with, it sounds like, data engineers) stewarding the data and interacting with teams across the company:

“Data managers—and the Data Management team as a whole—are really at the center of this ecosystem, corresponding with everyone,” Stojmirovic explained.


Clickhouse Local - clickhouse

Well this could be handy - use SQL to query and analyze csvs, data through pipes, JSON file, parquet files, or other formats in a number of structures using Clickhouse Local, based on the Clickhouse OLAP SQL engine. (h/t to a friend of the newsletter)


Research Computing Systems

How BSD Authentication Works - Dante Catalfamo

I’ve worked a couple of places where research computing systems were Linux, but with a few *BSD systems in various places - for running particular stacks, or even as login nodes. Unlike Linux, or even most BSDs, OpenBSD doesn’t use PAM (Pluggable Authentication Module) for authentication, but has its own module system called BSD Authentication. Catalfamo walks us through how authentication modules work for OpenBSD.


Emerging Technologies and Practices

Extract the Ensembl gene catalog to simple tables - Daniel Himmelstein

Nice and practical example of GitHub Actions and GitHub Flat Data (#75) for research data - the github actions in this repo keep track of new releases of the Ensembl catalog, for a variety of species, and automatically generates summary gene tables using a series of SQL queries. The tables are output (in tsv and json format) in branches corresponding to species and release.


Incident Review and Postmortem Best Practices - Gergely Orosz

If your team is thinking of starting incident reviews & postmortems - which I recommend if relevant to your work - this is a good place to start. Orosz reports on a survey and discussions with 60+ teams doing incident responses, and finds that most have a pretty common pattern:

  • An outage is detected
  • An outage is declared
  • The incident is being mitigated
  • The incident has been mitigated
  • Decompression period (often comparitively short)
  • Incident analysis / post mortem / root cause analysis - often aiming for within 36-48 hours of the incident
  • Incident review
  • Action items tracked.

Current best practices seem to be:

  • Encourage raising incidents, even when in doubt
  • Be clear on roles during incidents
  • Define severity levels ahead of time
  • Have playbooks ready
  • Make time for staff to work on the review
  • Dig deep when looking into causes
  • Share analysis fairly broadly
  • Find or build tools to support incident handling

He then goes into some details of conversations with teams that are going beyond best practices - companies like Honeycomb who, providing tracing for other team’s stacks, have very high uptime requirements (they publicly released an outage report for a 5 min outage!) amongst others.

A long article but worth a read.


How to make your microservice architecture end-to-end confidential - Edgeless Systems

As research computing and data expands into web applications and sensitive data - like health, or social sciences - there’s interest in building sensitive complex applications.

It’s interesting to see the growing tool set for doing this. Here the Edgeless Systems team that writes open-source software gives an example of a simple but nontrivial Kubernetes application involving their open source stack - database, a secure service mesh, SGX-app control plane, and an application tool written in go with the their EGo library - to build a complete secure application.


Events: Conferences, Training

Academic Data Science Alliance Annual Meeting - 10 Nov, Free, Zoom

The ADSA is having its annual meeting, with the topic of “Data science for social impact in university-based programs”.


Random

You can play Doom on all sorts of things now, but what medium would really be the best pairing to the mindlessly violent, nuance-free game….. Oh, of course. You can now play Doom via twitter.

On a happier note: fluid simulations of ducklings or goslings swimming behind their mother show that, spaced correctly, the ducklings ride waves and experience negative wave drag, getting little pushes forward.

A full JSON parser in awk.

The perl community wants you to know it’s not dead, yet.

A teaching implementation of an x86-64 assembler.

An app store - from the 80s. That sold software on diskettes.

A programmatic CAD - really a programmatic interface to multiple CAD engines - with an IDE: cadhub.

Hmm… Calyix looks like it’s aiming to be LLVM for FPGAs, specifically supporting higher level languages to design compute accelerators.

Summary of the larger-than-usual updates to PostgreSQL in the v14 update.

I’m already getting used to github.dev for a vscode-in-browser-directly-atop-the-repo experience; now vscode.dev (at least on Chrome, and presumably Edge) lets you have the same zero-install vscode experience on local files and repos.

Searching for the original FORTRAN compiler.

There’s a feasible proposal to get rid of the Python global interpreter lock; it’s a good article to read if you want a thoughtful but accessible explanation of why the GIL is still there.


That’s it…

And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.

Have a great weekend, and good luck in the coming week with your research computing team,

Jonathan

About This Newsletter

Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.

So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.

This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.