#141 - 15 Oct 2022

Making an Onboarding Plan; Performance Improvement Plans; Paying for Tools isn't 'Overhead'; Manage Your Career Growth; Strategy Should be Meaningful and Ongoing; 10 Rules For Workflow Applications; Software Mention Database; Datalake-in-a-box; Homomorphic Encrpytion & Confidential Computing

In #139, we talked about some general purpose first steps when taking on a new responsibility. Let’s talk about the other side of that, now — how to make sure someone we’re bringing on to tackle some new challenges can be as productive as possible.

Back in #135, we talked about beginning with the end in mind: having clear goals for the person taking on a new role, what it should like in the day to day. When scoping out a new responsibility, it’s important to define what success looks like after they’re fully onboarded. That onboarding timeframe may vary: say after 3-4 months for an individual contributor, maybe after 6 months or even longer for someone with significant leadership duties. That doesn’t mean they won’t continue to gro after that! But by that point the goal is that they’re functioning competently in the role. Our job, then, is to help them get there as quickly as possible, for the sake of the team and so that they can start feeling settled in and successful. That means designing an onboarding process.

There are four kinds of information to gather to design the onboarding process:

  • Make a list of all the knowledge skills and behaviours the person will have to learn to be at that successful end-state
  • Make a list of the people that the person should meet and talk to to understand the inputs to and impact of the work they’re taking on
  • Take an inventory of the resources (people, but also training materials or funding for training) you have available to help bring the person up to speed
  • Identify meaningful self-contained projects, increasingly non-trivial, that they could work on that would give them concrete places to apply to what they’re learning, and that could be milestones to being an onboarded, independent team member.

Building these lists will be iterative: items on one will suggest items that need to be added to another. Both you and the people you’ve had help you scope and define the responsibility can add to it and discuss it. For a software developer writing the lists lists might look like:

  • Understand one part of the code base extremely well and have a general understanding of how other components behave, know the CI/CD system and how we write tests, know the project management tools, how the team meetings are run, be able to run standups and retrospectives, understand our expectations around code review, pair programming and mentoring juniors
  • Meet everyone on the team, key users, and key potential users (oops, forgot about knowing how our user experience testing works), key administrative contacts
  • Resources include more senior developers, our existing practice of pair programming juniors, previous architectural decision documents, (oops, forgot about knowing where those documents are and expectations about documenting and how decisions are made), an institutional subscription to a MOOC platform where they can learn intermediate level skills in our programming language and version control systems, (oops, forgot about introducing themselves to our institutional professional development staff and other peer teams)
  • Projects include a complete solution to a small ticket (including adding a test and running the test suite), running a standup, building feature X that users have been waiting for, updating the onboarding documents, writing an architectural decision document, running a user test

Put together an onboarding list with these components, staging learning resources and opportunities to meet the community with meaningful contributions

With those in mind, and collaborating with the people who are helping define the responsibility, you now have the raw materials to put together on onboarding plan. Stage the resources in such that they are always building towards application in some meaningful contribution. Introduce them to people in the (increasingly) broader community so that they have growing connection to the inputs and impact of the work of your team. At earlier steps in the onboarding plan, feel free to be quite prescriptive (e.g. preschedule a pair-programming session with one of the seniors on pieces of the code base); at later steps, the training wheels start coming off, and you can just list the people resources and let the new person make the connections and learn the material themselves. (You can always become more involved where needed).

As always, none of this is rocket surgery - it simply requires recognizing that onboarding is important, putting together a plan with the resources you have available, and then committing to seeing it through.

What’s the best onboarding plan you’ve seen (or experienced)? What’s the worst? Hit reply or email me at jonathan@researchcomputingteams.org and let me know. And as always, members of our community can always feel free to schedule a quick call with me if they have thoughts or questions.

And with that, on to the newsletter!

Managing Teams

Should I Create a Performance Improvement Plan for My Direct Report - Lara Hogan

We, myself included, don’t talk often enough about underperforming team members. It’s uncomfortable! But letting a team member struggle indefinitely is bad for them, and bad for the rest of the team.

In that context of reticence about poor performance, Performance Improvement Plans (PIPs) frequently have a pretty bad reputation as a perfunctory, performative step taken before firing someone. The following situation isn’t uncommon: A manager repeatedly but only indirectly discusses with a team member some area of underperformance. The team member remains blissfully unaware, or thinks (perhaps reasonably, given the signals they’re getting) that it isn’t a big deal. The manager, increasingly frustrated by nothing changing, eventually blows up and goes to HR, gets a template for a performance improvement plan. With HRs guidance the manger fills it out, setting a timeline where change is unlikely to be able to happen effectively, and presents it to the team member as a fait accompli. Eventually the team member sees the writing on the wall and to the ineffective manager’s relief resigns, or the team member is fired.

The above is a pretty crummy scenario, and is absolutely the fault of the manager. It shouldn’t be and doesn’t have to be this way. Getting HR involved absolutely should be a final step, but in the scenario above, many earlier steps were skipped.

It’s vitally important to give clear, actionable feedback when something isn’t going well. It’s unfair to the team member not to. There are several methods (#73) including Hogan’s own that focus on behaviour and impact, typically bookended by questions. And waiting until things are too late to change (#06) isn’t fair to you or them.

Hogan describes the pros and cons of PIPs, and while documenting your feedback and working with them on a plan to improve in the needed area can be very useful, getting HR involved in the process is a pretty hard-to-retract step.

If you find yourself in the position of seriously considering one, Hogan has a nice decision framework here:

  • Are you confident then will not be able to meet expectations in a reasonable mount of time? Then starting the process of managing them off the team in one way or another is appropriate
  • If you think they could, then ask yourself: “What haven’t I stated clearly/bluntly yet to this person about what I expect of someone in this role?”
  • Meeting with the team member and clearly discussing the expectations and consequences is the next step.
  • Then start thinking about how to get from there to here.

There’s also useful sections on what to do if the team member doesn’t see the importance of meeting the expectations, or don’t have a shared understanding of the expectations. This reminds me of a nice article on the five necessary conditions for improvement mentioned in #36.

By the way, the discussion above assumes that the thing they’re being given feedback on to improve is the performance of their own personal tasks. Adapting to new work is hard, and a fair amount of patience is justified in giving people a chance in getting up to speed. In that case I’m fully on board with Hogan’s clear, measured approach.

But there’s another domain in which people might be getting corrective feedback - about how they’re behaving to other team members or the broader community. If they’re being toxic to others, do not feel any such need for patience. Correct them once or twice, each times explaining the expectations and consequences, and document the behaviour. The next time it happens, begin the firing process, whatever that means in your institution. Do not allow toxic behaviour to damage the team or your research community.

Nonprofits May Need to Spend a Third of Their Budget on Overhead to Thrive — Contradicting a Donor Rule of Thumb - Hala Altamimi and Qiaozhen Liu

Even the nonprofit world is slowly coming to terms with the fact that paying money to supporting team members in being more effective in their work (for tools, or with support personnel, or externally provided services) is not “wasting money on overhead” but is instead “getting the job done effectively and efficiently”. And yet too often I hear horrified cries like “but that could pay for a new compute node/a summer student” when the topic of paying money for things that team members need comes up.

And that’s how we end up with centres that claim “our people are our greatest resource” with high turnover because they don’t have the tools they need to get their job done well.

Funders have a lot to answer for here, but the cultures of our teams sustain this attitude too.

Managing Your Own Career

How To Drive Your Own Career Growth In These 6 Easy Steps - Lighthouse Blog

This blog post lays out a great plan for making sure your career growth is a priority - its worth considering for ourselves, but also keeping in mind for our team members. Like so much in management, there’s no silver bullet here, it’s just having a clear plan in mind and constantly advancing it bit by bit.

  • Start from even before the beginning - during the interview process, ideally, making it part of your negotations. (Unfortunately, it’s a lot easier to advocate for your career growth in a new job than it is to start advocating for it an existing job where you’ve been stagnating for a while. )
  • Bring it up in your one-on-ones
  • Have clear career growth goals, and break them up into small manageable pieces. Take note of progress
  • Get your manager involved, in ways that are easy for them. Small budget items, introductions, keeping an eye out for future opportunities.
  • Write down your plans and share it with your manager
  • Consider your manager, and align your growth priorities to their needs

By the way, the article starts by pointing to charts from PwC and Deloitte on what’s important for employees, highlighting career progression and growth as being top priorities. It’s worth comparing these graphs to those from our own community, Understanding Factors that Influence Research Computing and Data Careers (#133). They’re indistinguishable. Team members in our profession want the same things everyone wants - growth, recognition, flexibility, purpose, pride in their work and their team, manageable workloads, the tools to do their job well, and enough compensation that they don’t feel undervalued. We’re constrained somewhat on pay, but all of those other pieces are at least partially under our control.

Product Management and Working with Research Communities

Strategic Planning Should Be a Strategic Exercise - Graham Kenny
How Nonprofits Can Keep Strategy Front and Center - Alan Cantor

As readers will know, I think strategy is incredibly important for our teams. And because of that, I’m constantly vexed by two frequent problems with strategic planning in our profession.

The first in that when it’s done at all, far too often the output is feel-good pablum that reveals no insights, results in no changes, and is forgotten about as soon as the final version is printed. It’s a Potemkin compliance exercise for funding, not a way of collectively identifying problems and prioritizing solutions. The word “excellence” appearing more than say three times in the document is a terrific indicator of this particular failure mode. So is a described “strategy” of continuing to do some of everything. And if the resulting document isn’t routinely being cited internally during discussions in meetings about a decision being made? Well, it’s because the document has already been forgotten and no one took it seriously anyway.

The second problem is easier to fix. It happens when a good and valuable strategy came together, and was acted on, but the process was treated as a one-off. The temporary infrastructure put together for creating the strategy document — lines of communications to the community, meetings, documents — are torn down or slowly start to decay in place. And then after some arbitrary period of time (three years, five years, or a new leader coming in and asking for one), the whole process has to be recreated from scratch.

The first issue with the “one-off” approach is that those lines of communications with community, the community ownership of the plan, the trackers for problems and proposals — those are all in and of themselves strategic assets that need to be maintained and nurtured.

And that matters is because of second issue. We are well beyond the point where a research computing and data strategy development can be a twice-every-decade, fire-and-forget sort of exercise. Plans need constant monitoring. Priorities need periodic updating.

These two HBR articles this week touch on different aspects of this, Kenny reminds us not to think of strategic plans are fixed, and to aim for insight as part of the process. Cantor urges us at every board meeting (for non profits; think advisory committee in our context) to revisit strategy, with a reminder of the current mission, and real discussion of a strategic issue every meeting.

Note that this requires an active and engaged advisory committee! In some future set of issues we’ll talk about creating such engagement. We can learn a lot from how the best nonprofits develop their boards.

Coding For Economists - Arthur Turrell

A nice resource aimed covering python aimed for the economist community covering econometrics, time series data, text analysis, databases, geospatial data, and more. Also covered is writing computational results up in blog posts, technical reports, and papers.

Digital humanities needs equality between humanists and technicians - Urszula Pawlicka-Deger
Give technicians their due - Catrin Harris, Research Professional News

One of the best reasons to not silo ourselves into “research computing”, “research software”, “research data mangaement” camps is that we’re all wrestling with similar problems. And it’s not just us.

Pawlicka-Deger’s article drives home the need for greater respect to technical career paths for digital humanities, and Harris’ for technicians quite broadly in research.

Research Software Development

Ten simple rules and a template for creating workflows-as-applications - Roach et al.

Increasingly, scientific computations involve the orchestration of workflows containing many tools. This is especially true in data analysis, but even increasingly in simulation.

This paper gives ten rules for creating these workflow applications, paraphrased somewhat below:

  • Empower users to troubleshoot them by using logging well
  • Make it easy to configure the workflow by providing a default configuration file with overrides avialable on the command line (for experimentation) or with an updated configuration file (for version control)
  • Don’t hardcode assumptions, and provide utility workflows for any necessary environmental requirements (databases, input files)
  • Don’t silence real-time output - users need that feedback
  • Ship software with a package manager to siplify installation
  • Isolate the individual steps
  • Include simple test dataset
  • Provide profiles for different execution environments
  • Allow sophisticated users to interact directly with the workflow manager

This is a very valuable distillation of best practices, with concrete suggestions and sample code. It actually reminds me a bit of a workflow-as-application counterpart to recommendations for a twelve-factor app for web applications (which held up remarkably well during a decade of rapid change): configuration stored in the environment, admin processes/backing services run by their own workflows, logs as recording of events…

New data reveals the hidden impact of open source in science - Chan Zuckerberg Initiative
A large dataset of software mentions in the biomedical literature - Istrate et al, arXiv:2209.00693

Staff at CZI have put together and organized an amazing dataset of 67 million mentions of software, from something like 16-20 million biomedical papers. The dataset and code are available on Dryad and GitHub. The methods are fun to read (for instance, clustering to deal with the various ways the name of a piece of software is given in papers). I expect that the real fun will be to see analyses of this dataset start to come out.

6 Best Practices to Manage Pull Request Creation and Feedback - Jenna Kiyaso, Doordash

Some best practices from Doordash about handling PRs; it’s interesting to see how many companies and communities have converged on the same good practices:

  • Descriptive and consistent names
  • Clear PR title and description (which means: tightly focussed PR; a template with problem/solution/impact/testing plan is suggested)
  • Keep PRs short
  • Manage PR disagreements through direct communication
  • Avoid rewrites by getting feedback early
  • Request additional reviewers to create dialogue

Research Data Management and Analysis

Modern Data Stack in a Box with DuckDB - Jacob Matson
Delta Lake 2.0: An Innovative Open Storage Format - Matthew Powers

A couple of data lake articles this week:

Matson writes a fun post describing how to implement a current state-of-the-art data lake environment on a laptop or single node using Meltano for ELT pipelines, dbt for data transformations, Superset for data exploration and visualization, and duckdb as an embedded OLAP columnar database. This could be used as a way of playing with some new tools in a realistic-ish broader context, as a prototype for a data lake solution for a research group, or even as the first steps of building a real production stack.

For data lakes where a single columnar embedded database might not be enough, Powers gives the summary of what’s new in Delta Lake 2.0, a storage format based on Parquet files. I’ve always thought of this as being very mark part of the Spark ecosystem, because that’s how it started, but there are connectors to Presto and other data engines - and the ability to access or revert to previous versions is pretty valuable for our kinds of use cases.

Emerging Technologies and Practices

The Rise of Fully Homomorphic Encryption - Mache Creeger
Explained from scratch: private information retrieval using homomorphic encryption - Spiral Privacy
Constellation: The First Confidential Kubernetes Distribution - Felix Schuster, The New Stack

In research computing and data we’ve always relied heavily on multi-tenant systems, whether our own or externally provided. And now we’re increasingly asked to support sensitive data.

These first two articles give a good quick overview of Fully Homomorphic Encryption (FHE). FHE is a possible solution when you want to provide querying or even simple analyses of sensitive data on untrusted systems. Data is stored encrypted, using a scheme that preserves certain mathematical operations, and those operations are performed on the encrypted data directly. The server never sees the unencrypted data (or even queries).

There’s substantial overhead for using FHE, not least of which is that algorithms have to be rewritten to only use operations which are preserved under the encryption, but it’s surprising what’s possible under this scheme.

Confidential computing is another approach, relying on hardware support for secure enclaves or trusted execution environments within the server, which can’t be accessed by the OS or other tenants (even root), in which data is decrypted and acted upon. The downside here is it requires the hardened enclave to be hack-proof, and software has to be rewritten to support that; the upside is that arbitrary operations can be run and the enclave-supporting hardware is becoming more common.

Rewriting code to support the enclave isn’t trivial, but there are libraries, databases, and increasingly entire stacks being put together to make the confidential computing aspects more easily adopted - in the third article Schuster describes Constellation, an entire confidential computing k8s distribution.

Six European computing centres will be hosting quantum computing sites.


A record-breakingly powerful gamma-ray burst 2.7 billion light years away measurably disturbed Earth’s ionosphere.

ar5iv, from arXiv labs: read (v1 of most) papers in reactive HTML5 instead of PDF by changing the ‘x’ in arxiv in the link to ’5’.

SQLite famously is “open source but not open contributions”. libSQL is a fork of SQLite which aims for wider community contribution.

Hello world in Python, going down through the Python VM to stack traces to Windows console and system calls through vector font rasterization and window management

A fascinating three part series on the steam engine and why it wasn’t invented earlier when several key pieces were known as far back as the first century.

Working with hexagonal grids.

TensorStore, a Google-developed “one stop shop” for efficient reading and writing of multidimensional arrays from Python.

Interesting blog post summarizing a paper about how software developers use CoPilot (and presumably similar tools). The paper distinguishes between acceleration of what the developer was already going to do, and exploration of what to do next.

Rendering galaxy clusters, black holes, and Saturn in Minecraft.

That’s it…

And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.

Have a great weekend, and good luck in the coming week with your research computing team,


About This Newsletter

Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.

So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.

This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.

Jobs Leading Research Computing Teams

This week’s new-listing highlights are below in the email edition; the full listing of 193 jobs is, as ever, available on the job board.