#60 - 5 Feb 2021

Hi!

I hope you’re having a good week. Below is the continuation of our discussion on hiring, stemming in part from the more formalized pipeline that we’re working on; you can also skip to the roundup.

Last week I started with the basic premise - you have a hypothesis that you’ve found a good candidate (and they have a hypothesis that your team would be a good match for them). Then, as scientists, the job is to disprove the hypothesis.

If you accept that the hiring process is about both sides being able to detect a mismatch as early on as possible, a lot of next steps fall into place. The single most important thing we can do to ensure a good hire is to have a really clear, unambiguous description of what the job requires and what we’re looking for - so unambiguous that you could hand it off to someone else and they’d end up with basically the same post-interview short list of candidates that you would.

A clear description of what you need will help people who would be good for the job find it and self-select by applying, build agreement with your current team members around what you are collectively looking for in a new hire, and help you separate out those who would be good additions to the team from those who wouldn’t as early on in the process as possible. And sitting down and hashing out with your team the job description and requirements is a great way to have an open conversation about work in the team - both as one-on-one conversations and then collectively with the group.

Like so many things in managing, putting together an internal job description and list of requirements isn’t rocket surgery, it just requires thinking things through carefully. The most common issue I see with job descriptions in research computing jobs - and I look through a lot of research computing job ads - is that they are way too specific about technical requirements and not nearly specific enough about anything else. When we’re hiring, we’re hoping to choose a team member who will contribute in a number of ways to the work of the team over at least a few years, and our job requirements should cover what it will take to be successful over that period of time.

To counteract this tendency to focus on technical details, it’s useful to imagine having a successful candidate in the job three to six months later that’s working out really well, and back your way out from there what you’re looking for. This will help you and your team focus on what the person will be to work with, what kind of gaps you currently have.

It will also usefully tend to downplay the requirement that they “must have three years of [tool X] experience”. No one is going to be fully productive in a new role in the first few months - even tool X experts will take some time to learn your particular code base/architecture/system - and this gives someone who has demonstrated related skills a couple of months to learn tool X passably well. Do you really see yourself hiring someone, in research computing, who isn’t broadly capable enough to pick up enough of a language/system/process to start contributing in three months if they’ve done related things with different technologies in the past? If not, why make tool X a hard requirement?

Manager Tools has a podcast episode on writing simple job descriptions which is very useful. They suggest starting with five questions (tweaked here for context):

The reason the [team] created this job was
The most important ways a person doing this job should spend their time are
The 2-3 most important duties of this job are
What this job takes to be successful is
The simplest, easiest way to see if this job is being done well [in three months] is

In our line of work since our tendency is to focus on the technical skills I find it helpful to make explicit in the job description the team skills: in our context that might look like

The person interacts with the team by:
The person interacts with researchers/users/the community by:
The person brings the following skills/knowledge/background to the team:

These last questions, especially the first and second, are about cultural fit. When cultural fit is just used to vaguely mean “like us”, it can be a huge source of bias in interviewing. But when it’s clearly and explicitly defined, it is useful for both sides as a way to clarify expectations about how people work together on the team. (We’ve gone so far as to start an evolving slide deck on how the team works together - it needs work in sections, but the discussion around the slides has been very useful!)

The needs of your jobs and team are going to vary, but in research-adjacent environments we often have cultural expectations around collegiality:

Highly collaborative
Willing to pitch in
Willing to respectfully disagree on methods and aims, while also
Willing to let others have ownership of their part of an effort

and about independence:

Willing to learn what needs learning, and/or get help from someone who knows what needs to be known, do the job at hand
Able to take high level objectives and constraints, suitably described, and flesh out the necessary steps themselves
Willing to initiate communication with others as needed for input
Willing to take external input into account and change or correct their approach

These expectations are neither objectively good nor bad, low nor high, to have of team members, but they are common in our line of work. People who work very differently - who expect very well scoped tickets to work on in disconnected chunks of work, or to be able to toil along on in their own in a corner without interacting with others - are unlikely to enjoy or succeed in environments with these expectations, and vice versa. It’s best to have your teams working expectations extremely clear at the outset for both your clarity in evaluating the candidates and for transparency to the candidate about what the job entails.

Once you have understanding of the requirements for the role, you can start prioritizing them into “must-haves” and “nice-to-haves”. It’s important to be ruthless about pulling items out of the “must-haves” list! It limits your job pool unnecessarily, and divides your focus when deciding on applications in too many directions. Is your team really unable to support a new team member with related skills as they learn about tool X and platform Y?

A complication when prioritizing requirements in research computing is that, it’s pretty common to be open to hiring someone with any of two-or-three different kinds of skillsets. In data science groups you might be interested in growing your team’s skills into NLP or computer vision; a systems team might be open to a security expert or someone with deep openstack experience, or someone who has deployed a monitoring and alerting system before. That’s ok; when distilling this down into a single job description you just break out the common requirements and activities, list them first, and then think of the others “the candidate must fit one of the three following profiles” and then list them separately. Ideally we’d prioritize one and only list it, or have three separate job ads, but that is sometimes out of our control and we we have to work with the situation as it exists.

You can now distill the activities and requirements into a job description. This document is now starting point for discussion with whole team, and other stakeholders who would be working with the new hire. Do they see requirements you’ve missed? Do they have different priorities for those requirements than you initially thought? Are there areas of disagreement that must be understood and resolved?

The next step after having agreed-upon requirements is to think about how to evaluate them; that will come next week.

And now, on to the roundup!

Managing Teams

Writing Is One of the Best Things You Can Invest In, as a Software Engineer. The More Experienced People Become, the More They Tend to Realize This. - Gergely Ortoz

Speaking of non-technical skills being underrepresented in technical job descriptions… Communicating well is absolutely essential part of a job in any interdisciplinary endeavour like research computing, and written communication is becoming absolutely vital as teams go remote. That doesn’t necessarily mean particularly good grammar or vocabulary - we’re an international community, many in our community are ESL, and those are things that can be cleaned up with tooling support afterwards. But being able to logically make a point, express an argument, or describe a process is essential.

In this twitter thread, Orosz lists a number of resources attempting to convince the reader of this point, and other resources that he feels can be used to help improve written communication skills.

Talent is largely a myth - Avishai Ish-Shalom

In research we’re pretty good at understanding that people grow in capabilities over time, and we typically avoid the tech company trap of talking about “Hiring the Best Talent”. But when we focus our job searches for people who can solve our immediate technical problems when they walk in the door, which is easy to do if we’re not careful, we can backslide into this mentality.

Ish-Shalom reminds us that:

Talent is multidimensional
Talent isn’t static
Talent isn’t linear

and if we’re trying to hire “the best” candidate during our job search we don’t pay enough attention to our team’s abilities to help the candidate grow and for their strengths to develop.

Managing Your Own Career

Maximize your mentorship: search and secure - Neha Batra

I don’t think it’s controversial to suggest that as research computing managers we are given precious little guidance, or useful advice. If we want those things, we have to seek them out ourselves.

Like with putting together a solid list of job requirements, the steps for finding and recruiting mentors to give us some advice aren’t surprising or challenging - there’s no “One Weird Trick for Getting Mentorship”. You just have to figure out what you’re looking for, who you’d like to talk to, and approach them seeking some advice.

People, even busy people, are generally pretty open to having occasional short conversations with and giving advice to people who are earlier in their career path and have questions. And in other contexts, we know this - those of us trained in academia generally wouldn’t think twice about contacting a more senior author on a paper we were interested in, or a colloquium speaker, to ask some questions about how they did the science. But we’re so weirdly conditioned around management not being real valid work in academia that we’re pretty reticent to approach people seeking advice on those topics.

Batra goes through the steps of figuring out where you want mentorship, prioritizing potential mentors, an initial ask for a discussion, and asking for another conversation in a couple months.

Product Management and Working with Research Communities

If you build it, promote it, and they trust you, then they will come: Diffusion strategies for science gateways and cyberinfrastructure adoption to harness big data in the science, technology, engineering, and mathematics (STEM) community - Kerk F. Kee, Bethanie Le, Kulsawasd Jitkajornwanich

Software packages, like ideas, don’t in fact speak for themselves. Getting any sizeable number of people to adopt a new idea, new practice, or new tool requires enormous amount of coordinated communication effort. In this paper, Kee, Le, and Jitkajornwanich describe what they found to be kept practices to increase the adoption of research computing tools - in this case science gateways and cyberinfrastructure. And why would we build tools if not to have them adopted?

Based on an analysis of 83 interviews with 66 administrators, developers, scientists/users, and outreach educators of SG/CI, we identified seven external communication practices—raising awareness, personalizing demonstrations, providing online and offline training, networking with the community, building relationships with trust, stimulating word‐of‐mouth persuasion, and keeping reliable documentation.

Relatedly, I’ve recently discovered the Open Source Guides which have brief but good overviews of what you should be thinking about to get users for your open source software, building communities, best practices for maintainers, and developing formal governance when it’s time.

What does a scientific community manager do? Check out the CSCCE Skills Wheel and accompanying guidebook! - Centre for Scientific Collaboration and Community Engagement
Community Engagement Planning Canvas - Tamarack Institute

The the skills needed to manage a scientific community are of immediate interest to us as we try to engage with a research user community for software, systems, curated data, or anything else.

Why scientific community management is so important, in the CSCCE’s estimation, is:

… science is inherently a community-based endeavor. The generation, validation, and dissemination of knowledge requires a network of diverse roles and a range of community configurations to meet specific needs -whether those needs bridge across disciplines, career stages, institutes or other boundaries.

The skills wheel workbook is a short 19 pages covering skills needed by an individual or a team engaging a community in the following areas, all of which are needed:

Technical
Interpersonal
Communications
Program Management
Program Development

At a more tactical level, the Tamarack Institute has a community engagement planning canvas for planning and designing particular community engagement activities.

How I use Pandoc to create programming eBooks - Sébastien Castiel A good overview of a workflow for generating nice longform technical documentation in a variety of formats with markdown + pandoc.

Research Software Development

Fulfilling the promise of CI/CD - Charity Majors, on the Stack Overflow Blog

Majors, who makes regular appearances on the newsletter, has a very clear view on the value of CI/CD - and, in particular, CD:

The point of CI is to clear the path and set the stage for continuous delivery, because CD is what will actually save your ass.

[…]

Until that interval [LJD - from writing new code to testing and at least some users working with the new code] is short enough to be a functional feedback loop, all you will be doing is managing the symptoms of dysfunction.

She points out that having good CI testing and CD is a matter of priorities, not skill sets:

The teams who have achieved CI/CD have not done so because they are better engineers than the rest of us. I promise you. They are teams that pay more attention to process than the rest of us. Great teams build great engineers, not vice versa.

Effective Property-Based Testing - Russell Mull, Auxon
Generating Web API Tests From an OpenAPI Specification - Henrik Strömblad, Nordic APIs

We’ve talked about property-based testing with particular packages before on the newsletter - most recently #56. These articles distill the use of property testing to some high level considerations - in the first case, quite generally, in the second case for restful API testing in particular.

Mull’s article gives good advice for how to approach property-based testing in general. Its a good deep article - here are some pieces of advice that stuck out to me:

Develop a problem-specific generator library for your package that reflects the cases you care about rather than depending on automagically generated generators
Use combinators (map, filter, tuple, choose, just) to make as much possible use of your generators as possible
use flatmap/bind for data dependencies like internal consistencies
Start small
“It doesn’t crash” is a perfectly good starting test
Test system boundaries, then the whole system

Strömblad’s article looks at REST APIs defined in OpenAPI. From my point of view, API specification languages like OpenAPI are a crucial first step for defining interfaces and allowing clear tests, expectations, and can reduce the need for writing code boilerplate in services or clients. Strömblad describes the use of a new package, Humlix, to start developing simple property-based (rather than example-based) testing for APIs.

Results for the first Stanford Software Survey - The Stanford Research Computing Center

Results of a Stanford-wide research software survey on use and development of research software. Some headline takeaways:

25% of respondents think of themselves are research software developers
97% of groups responding use something they think of as research software, and 91% think research software is vitally (68%) or moderately (24%) important to their work
71% of use or develop open source software or services
Only 25% of people who think of themselves as developing software feel they have received enough training in software engineering best practices;
- 77% are confident or very confident in their use of version control, but
- confidence drops to 42% with unit testing, and
- to 33% are with continuous integration
Even at Stanford, only 29% of respondents think the University’s level of support for software-development needs is excellent (6%) or good (23%)
There are nice treemaps of research software used
48% of groups have included costs for software development in a grant, 38% of groups have hired someone specifically to develop software

Research Computing Systems

Kobalos – A complex Linux threat to high performance computing infrastructure - Marc-Etienne M.Léveillé and Ignacio Sanmillan, ESET
Kobalos — Indicators of Compromise - ESET

A really sophisticated malware targeting HPC clusters has been found by consultancy ESET, who has named it Kobalos. It’s targeting multiple operating systems including Linux, FreeBSD and Solaris, and perhaps even AIX and Windows, will contact a command and control centre, and will try to infect other systems. It may or may not be related to the rash of HPC centre compromises last year; Kobalos may in fact predate those. Once infected, ssh will be backdoored

There’s a twitter thread tl;dr if you like, a white paper, and the github link has a variety of hashes that can be searched for. The good news is that it seems to be relatively easy to scan a system or network for Kobalos.

Unrelatedly, I assume you’ve already done this, but if you haven’t, update your sudo (again) - this most recent vulnerability is a really bad one.

Achieving 11M IOPS & 66 GB/s IO on a Single ThreadRipper Workstation - Tanel Põder

Last week we mentioned about how fast modern drives are in the context of floating point deserialization. Here, Põder points out that I/O throughput can now be limited by CPU and memory rather than the disk, in a quest to get the highest IOPS and throughput he could on a workstation. In doing so he gives a very nice and thorough overview of the entire I/O path (except the filesystem, which he bypasses here - imagine a database server doing raw block access).

With just one of the Samsung 980 Pro PCIe 4.0 SSDs he was able to get 1.149M IOPS or 6.811GiB/s throughput, and that was only keeping CPU 1% busy. To keep pushing, he profiles kernel activities, switches to direct I/O, adds a PCIe 4.0 Quad-SSD adapter and tunes it to avoid a bottleneck at the root complex, giving brief introductions to psn, 0x.tools, lspci, dstat, fio, and hdparm along the way.

Emerging Data & Infrastructure Tools

Speed up pip downloads in Docker with BuildKit’s new caching - Itamar Turner-Trauring

BuildKit will now cache directories during builds the way that say Travis-CI or other CI/CD systems will; that can greatly speed up builds that make small changes to dependencies. The example given here is for python applications (that’s the topic of Turner-Trauring’s blog after all) but is widely relevant.

Calls for Proposals

Annual Modelling and Simulation Conference 2021 - 19-22 July, Hybrid Fairfax VA USA and online, Papers due 1 March

Tracks of particular relevance to us include

High performance Computing and Simulation
AI and Simulation
Communications and Network Simulation
Modeling and Simulation in Cyber Security
Theoretical Foundations for Modeling and Simulation

JuliaCon 2021 - 28-30 July, Virtual, Free; proposals due 23 March

The CFP is looking for talks, lightning talks, mini symposia, workshops, posters, and BoF sessions particularly on Julia applications or approaches to:

Biology, bioinformatics, health, medicine, and health disparities
Data analytics and visualization
Finance and economics
General computing
Industrial applications
Julia’s compiler, tooling, and ecosystem
Numerical and mathematical optimization
Scientific computing
Software engineering best practices
Statistics, machine learning, and AI

Events: Conferences, Training

Research Squirrel Engineers - An independent squirrel network for RSEs in DH and archaeology - SORSE talk, 11 Feb 16:00 UTC

This short SORSE talk describes the nascent Research Squirrel Engineers community forming in DH and digital archaelolgy.

European Molecular Biology Organization Lab Management Training - Various dates through 2021

EMBO is one of the few organizations out there doing leadership training for scientists; the session aren’t cheap but are very well regarded. Online sessions are coming up for:

Laboratory Leadership for Group Leaders
Laboratory Leadership for Postdocs
Laboratory Leadership for US Scientists (NA friendly timezones)
Negotiation for Scientists
Self-Leadership for Scientists
Self-Leadership for Women Scientists
Project Management for Scientists
Applying Design Principles to Schematic Figures (might be useful for architecture diagrams?)

Future proof - various dates in Feb and March, Virtual

Of possible interest for subsurface scientist trainees you work with - Agile* has software development classes for subsurface science coming up:

Intro to Geocomputing, 5 half-days
Digital Geology with Python, 4 half-days
Digital Geophysics with Python, 4 half-days
ML for Subsurface, 4 half-days

Random

A modern, literate-programming take on forth.

A software development veteran shares opinions that she has changed (and some she hasn’t) over a decade+ in the business.

A columnar, Rust based distributed computing package focussed on ETL jobs - ballista. Think dask but more or less just for ETL (for now).

krunvm creates and manages lightweight VMs from OCI-compliant container images.

A twitter thread describing getting cron’s man page and code back into sync 30 years later.

The Titus Brown lab has an an example of moving a python research computing project away from setup.py and towards pyproject.toml/setup.cfg.

More examples of cloud providers actively going after HPC customers - Google Cloud has a machine image specifically for HPC jobs.

Degrees of Success: The Expert Panel on the Labour Market Transition of PhD Graduates, by the Canadian Council of Academies, is an interesting and in-depth look at the labour market outcomes for Ph.D. students in Canada; different countries will have different (but probably not wildly different) results. Very interesting for trainees and those working with trainees.

RCT Newsletter