#110 - 19 Feb 2022

Missed delegation opprtunties; Good candidates is hard work; Diffrax; Concurrency in OpenMP and Rust; A successful CentOS-Rocky Migration; DAOS for HPC Object Storage

Hi!

I won’t lie; when I started offboarding myself from my current job a few weeks ago I was a little anxious. I was pretty sure that team members would step up and learn quickly. It felt relatively unlikely that we’d discover gaps that could threaten near-term milestones. One worries nonetheless.

Instead, I’m really impressed with how well everything has gone. If anything the team has come together even closer, rallying to fill such holes as my departing creates. There’ll certainly need to be another hire - the team is losing a person’s worth of effort, after all. But the big picture work, both technical and stakeholder facing, is in extremely capable hands. I’ve felt increasingly superfluous this past week. That’s the best possible outcome.

But that also means that I’ve been surrounded all month with examples of things I could (should!) have been delegating and documenting earlier. Activities I thought only I could perform are in fact somehow getting performed. Things I could have usefully have documented ages ago, are only now finally getting written up. It’s not a disaster, nobody died of it. And yet there were real missed opportunities for team member growth and for me to have spent more time focussing on strategic goals.

That’s regrettable, but I’m not raking myself over the coals for it (much). Managing is like performing research, or building technical solutions. You never get to a point where you’ve learned enough to stop honing your skills. There’s only continued growth, or stagnation. I’m leaving this job, project, and team in better shape than I found it. I’m leaving the role a better manager than when I arrived, too.

With that note (this newsletter will stop being all about me next week, I promise!), on to the roundup!

Managing Teams

The Inverse Relationship of Quality and Quantity of Candidates in the Hiring Funnel - Nadyne Richmond

In an increasingly tough hiring market, we can’t just post a job ad somewhere and hope that we get a flood of applications from amazing candidates. That approach works in faculty and postdoc searches, where the jobs are incredibly scarce and applicants are plentiful. But research computing is different. The world’s awash in technical job openings. A “Fire and forget” approach to getting candidates for jobs isn’t enough.

Richmond points out that the quality of candidates for her job is much higher for the labour-intensive, lower-quantity methods of recruiting candidates - such as candidates like the manager contacted directly, come from internal referrals, or were contacted by trained recruiters.

As we progress in our career we can build a network of people we know would be good hires or who we trust to recommend good hires. But we can only know so many people! And we want to be able to hire from outside of our own network, if only to make sure we’re getting broader perspectives and expertise into the team.

What have you found works for you? Do any readers in academia have any experience with using recruiters to hire research computing and data team members? What other approaches have ben successful?


Managing people - Andreas Klinger

This is a bullet-point list aimed at new managers, and so there’s some basics here - but it’s always useful to review the basics, and we always have new members joining the list (hi!) including new managers.

I like this list because it expresses a lot of ideas that have come up on the list before, but in ways I haven’t read. I wouldn’t necessarily express all of those thoughts this way, but it’s useful to get another take on ideas. Some highlights:

  • As a manager, everything is your fault
  • You manage processes; you lead people
  • Processes are expectations made explicit
  • In every discussion/project/problem/situation, it needs to be clear who makes decisions
  • Avoid back and forth
  • Trust through transparency
  • Don’t confuse autonomy and abandonment

I think the last one I chose, “don’t confuse autonomy and abandonment”, is particularly relevant in academic areas of research computing. We tend to expect and grant a large degree of autonomy - which is great! But that doesn’t mean not giving our team members what they need to succeed, including as much context and, yes, guidance and direction as they require. The trick is not to always be giving them more guidance and direction than they require.


8 Tactics to Help Manage Perfectionists at Work - Alexandria Hewko, Fellow Blog

Our community, at the intersection of research and tech, tilts towards perfectionism. Being committed to doing things well is great! And, like all good things, it can be taken to an extreme which becomes dysfunctional.

I think we all can recognize the signs of perfectionism; Hewko suggests an approach to leading a perfectionist on your team. First is (as with all team members) to acknowledge, to yourself and to them, their positive traits, and lean into those strengths where possible. Then, Hewko tells us, it’s necessary to set a bit of direction. Coach them to delegate (and help them avoid micromanaging the delegated task), and gently take control of their priority lists so they’re not rabbit holing down unnecessarily. Build and use lines of communication by making sure to build trust in your one-on-ones so they can share what’s going on, give constructive feedback, and let them know that coming up short is ok. Finally, make sure they’re in the right role and working on the right things for their skills and tendencies.


Product Management and Working with Research Communities

Toolkit for Cross-Disciplinary Workshops - Borhane Blili-Hamelin, Beth M. Duckles, Marie-Ève Monette

Research computing and data work is inherently cross-discipline. In our line of work success requires that at least two fields, the computational and the domain science, come together. In larger collaborations there might be many fields. In those cases it too often falls to the technical team, the ones charged with doing much of the the actual deliverable work, to be the glue connecting the communities. It’s hard work knitting together a team with diverse expertise, requiring skills that fall well out of the realm of the technical.

This work by Blili-Hamelin et al outlines their approach to hosting cross-disciplinary workshops. It could be very useful for large collaboration kickoff meetings too, or exploratory meetings looking for a point of collaboration.

A key point of the report is that the beginnings of cross-disciplinary discussions can feel slow (excruciatingly so in my experience!). But much of the actual output of the collaboration is exactly the development of a common language and shared mental model that allows translation between the groups. Groping towards that shared mental model is the slow initial part. Rushing that phase of discussions undermines the collaboration’s potential, while building and documenting it clearly and strongly will make everything else easier.

But doing that takes some preparation:

We found that cross-disciplinary workshop activities needed to be sequenced and built to make attendees feel comfortable learning new perspectives and empowered to experiment together with them.

They have a whole plan of action in the report, I own’t try to summarize here, but sending out pre-surveys and pre-reads, having ice breakers, lots of questions and curiosity, and brainstorming all play a role. They also have an interesting take on impromptu panel discussions; the “fish bowl” discussions.


Research Software Development

Diffrax - Patrick Kidder

We’ve talked more than once recently about JAX here (#98, #103) and it’s undeniably cool. But in research computing what’s valuable about a package isn’t the tool itself but what it enables.

We’ve seen PDE simulations taking advantage of JAX (#98). Here’s a differential equation solver - ODE, stochastic DE, or controlled DEs that’s built on JAX. It uses the capabilities of that library to expose a coherent API, and to allow fast, JITted ODE solves in Python on CPU, GPU, or Google’s TPUs.


Building for the 99% Developers - Jean Yang

In research computing, it’s easy to watch what happens in the tech industry, the cool tools being used and approaches taken, and feel like we’re in the backwater.

Yang reminds us that the stuff that gets a lot of attention as the cool new technology or process in tech isn’t necessarily super widely adopted even in tech. The approaches that the FAANG or other big companies take are to solve specific problems, in environments with very different tradeoffs than our own. As Yang says, “There is no gold standard development environment”. There are only problems to solve, and a variety of choices of tools with which to solve them. The point is to solve the problem, not to use something cool.


Seamless, Fearless, and Structured Concurrency - Evan Ovadia

This is an interesting article on threading which talks about “fearless” (race-free) and “seamless” (can access all data) concurrency, and connects OpenMP threading, which many of us are familiar with to approaches for concurrency in Rust and Pony, and uses all of those ideas to describe how the authori is implementing concurrency in their own language called Vale. It’s a quick but interesting walk through a few different approaches to threaded concurrency.


Pathways to Enable Open-Source Ecosystems (PEOSE) (22-572) - NSF

Its fantastic to see increasing funder interest in providing funding open source ecosystems. If only the call was for longer than 1 or 2 years and directly funded maintenance. Still, bit by bit..

The goal of the PEOSE program is to fund new OSE managing organizations, each responsible for the creation and maintenance of infrastructure needed for efficient and secure operation of an OSE based around a specific open-source product or class of products. The early and intentional formation of such managing organizations is expected to ensure more secure open-source products, increased coordination of developer contributions, and a more focused route to impactful technologies. … Importantly, the PEOSE program is not intended to fund the development of open-source research products, including tools and artifacts. The PEOSE program is also not intended to fund existing well-resourced open-source communities and ecosystems. Instead, the program aims to fund new managing organizations that catalyze community-driven development and growth of the subject OSEs. The expected outcomes of the PEOSE program are (1) to grow the community of researchers who develop and contribute to OSE efforts, and (2) to enable pathways for the development of collaborative OSEs that could lead to new technology products or services that have broad societal impacts. OSEs can stem from any areas of research supported by NSF.


Research Data Management and Analysis

PipelineDP Brings Google’s Differential-Privacy Library to Python - Patrick Zhang, InfoQ

Google and research institutions don’t have a lot of situations in common, but “having lots of data and wanting to be able to let researchers analyze it with some conditions” is one of them. Research groups who gather health or social sciences data may want to allow other users analysis over the data with some certainty that privacy violations will be rare. Differential Privacy grants that, under certain conditions.

IBM and Microsoft have already released differential privacy libraries; here Zhang tells us about Google andn OpenMined open sourcing their PipelineDP package, which plugs in nicely to Spark as well as being a standalone Python library.


Research Computing Systems

OS upgrade for KLONE - Nam Pho, U Washington Research Computing

Interesting - UWs new cluster, KLONE, was running CentOS 7, and with the CentOS 8 changes had to figure out what to do. They ended up going with Rocky 8 over some holiday downtime, and it’s been running for a month or so in GA. It sounds like there were a few re-compiling issues for users code but it went pretty smoothly. Love to see a migration story that’s boring from the user point of view. Do we have anyone from UWRC here? I’d be very curious to get the “behind the scenes” reporting from this migration.


Emerging Technologies and Practices

What Will AMD Do With Programmable Logic and other Xilinx IP? - Timothy Prickett Morgan, The Next Platform

So both Intel and AMD now have significant FPGA investment - Intel with the 2015 purchase of Altera, and now AMD’s purchase of Xilinx.

I won’t comment too much on this, except that it’s potentially of interest to the community and TPM’s take is always very informed by deep experience with the technical computing market.

I don’t think it’s controversial for me to say FPGAs haven’t really found a niche as standalone general purpose accelerators. But there’s been real success targeting particular niches like genomics. FPGAs have a long history in telecom and now smart network-cards suggesting they’d be valuable in datacenter scale networking. And TPM brings up Intel’s previous experience with “Frankensockets”. That particular experiment wasn’t super successful, but that doesn’t mean the idea of blending the programmability of a general purpose core with on-socket programmable logic, by either vendor, couldn’t be really interesting.


Intel Targets DAOS Object Storage at more than HPC - Daniel Robinson, The Next Platform
Garage: An Open-Source Distributed Storage Service you can Self-Host to Fulfill Many Needs - Garage Team

Two very different kinds of object storage for different kinds of research computing - HPC and data resiliency - needs both came up this week.

I’m surprised to search the archives and see that DAOS hasn’t come up before. DAOS is the open-source Distributed Asynchronous Object Storage that came out of the US DOE labs (LANL, I think?) and is being commercialized by Intel. The idea is to take advantage of the capabilities of Open Fabric Interface and Persistent Memory/NVMe to provide distributed data via a number of different models, from fairly classic object store “blobs” or MPI-IO to directly exposing distributed arrays and key-value stores.

High level view of the DAOS architecture, with a distributed control plane and engine serving MPI-IO, object store, and native array or key value needs

POSIX filesystems on individual workstations are amazing pieces of technology. They work so well that most people can safely avoid ever thinking about them - people don’t restart their laptops because their filesystem is acting wonky. But they’ve never scaled to large clusters, and big scientific workloads don’t really need them - e.g. MPI-IO doesn’t require (or even really benefit from) a POSIX filesystem on the back end. Something that can help nudge scientific workloads towards other storage approaches, like object store but also like array or key/value for structured data that needs slicing and dicing, could be really cool.

When Aurora finally gets up and running, DAOS will be managing 220 PB of data, so we’ll see how it works under real load. And from there, Robinson tells us that Intel plans to target other areas such as AI. It’s not hard to imagine that data storage could be the first place we really see “HPC/AI convergence”.

For other uses, Garage is an easily-self-hosted geo-distributed object store more for web or data hosting in a highly-available way. It’s very decidedly not for HPC applications. It has basic S3 compatibility, so it’s relatively straightforward to tie into other applications. For teams looking to make data assets available with self-hosted solutions, this might be a choice worth looking into; has anyone played with it? I know there are Nextcloud providers on the list.


Random

Oh this is great news - arXiv papers are getting DOIs.

Interesting online book on implementing Algorithms for Modern Hardware, covering in significant depth instruction-level parallelism, profiling, numerical arithmetic, external memory (as you know, I think this is going to get increasingly important), cache, SIMD parallelism, and some interesting case studies. I really like the section on cache associativity, which is pretty unintuitive and can lead to surprising performance bugs.

Google Slides is Trolling You.

Using Mermaid syntax in Markdown on GitHub to get rendered flowcharts or graphs or sequence diagrams or gantt charts is now live. We’ve already started using this. No more firing up Keynote or draw.io just to draw a few boxes with arrows!

Write your own sudo. CR/LF: LF as a way to pass some needed time on teletypes while the carriage literally returned.

Helpful ssh scripts like ssh-ping, ssh-certinfo, and more: ssh-tools.

Perl code that is only syntactically correct on Fridays. because perl is a cursed language.

It’s easy to forget that some very basic things in computing are genuinely hard - e.g. there was very active work improving float-as-a-string algorithms for printing floating point numbers up to the 90s, and even figuring out the optimal way to draw circles on a pixelated screen is nontrivial.

I’m a sucker for useful interactive explanations of things, it’s a great example of the power of computing for insight. Here’s an interactive article explaining GPS.

If you do technical computing and you want to try your hand at functional programming, OCaml is a really solid choice. OCaml from the very beginning.

Universal Paperclips: Help the AI running a paperclip factory achieve sentience.

I’m really fascinated by these new open-source no-code/low-code tools for building internal applications, like Appsmith. Is anyone using things like this for internal or researcher-facing applications? How’s it going?


That’s it…

And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.

Have a great weekend, and good luck in the coming week with your research computing team,

Jonathan

About This Newsletter

Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.

So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.

This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.