Many people in our community — and in the broader research community we serve — are in pain this week. There’s another video of another Black man, George Floyd, begging for his life while being murdered by a police officer in Minneapolis. Here in Toronto a Black woman, Regis Korchinski-Paquet, died when what should have been a routine call resulted in a mystifying number of police officers showing up. With only police officers present in her apartment, she went over her high-rise balcony to her death, with her last words being, repeatedly, “Mom, help”. This is all taking place during a pandemic which is disproportionately killing and incapacitating Black people, Indigenous people, and people of colour because they have less access to jobs that can be worked from home, and are more likely to be living in overcrowded multi-generational homes.
So with news and social media being dominated by consequences of systemic racism, anti-Black violence in particular, and police violence in reaction to anti-police-brutality protests, a lot of people are feeling despair and anguish.
As managers, we are leaders of communities. Small communities, but nonetheless. We have a responsibility to members of those communities to let them know we support them and are here for them. It doesn’t take much to be small bit of genuine help to someone really struggling. But we have to initiate the conversations. Our team members won’t open up to us about these topics until we’ve demonstrated we can have some kind of adult conversation about racism.
Doing or saying something is scary for many of us in research computing — who are overwhelmingly not Black and mostly white, which is a related conversation we need to have — because we are worried, reasonably, about getting it wrong. And it’s easy to make the excuse that because we don’t have Black team members (which… you know, same) it’s not something we need to address.
But most of us don’t have team members who have gotten sick with COVID-19 either, and we’ve certainly been addressing that. It’s been hard and uncomfortable and we didn’t get it all right the first time around and we did it anyway. You don’t necessarily know who’s hurting in your team and community or why. Not addressing a topic dominating the news and social media now doesn’t project professionalism, it just suggests discomfort or indifference.
I do not have great suggestions about what to say or do. I can offer some articles and collections of resources I’m finding useful:
I can also tell you what I’m doing at work. I’ve raised the issue at our all hands meeting using words much like the above, and let people know they can talk to me about it if they need to. Unhelpfully, I sounded a bit awkward, even after practicing, but the next conversation will be easier. I’ve made a point of checking in a little deeper with people during one-on-ones, and doing a lot of listening, I’m listening for feedback even when it’s uncomfortable, and I’ll keep reading those materials, and others, to see what I can do better and how I can support change.
That’s not the best or even a particularly good way to address what’s going on now and what’s been going on for a very long time. It’s the bare minimum, and started too late. The challenge will come when making changes, then advocating for more change to peers and upwards. But it’s a start.
Have you started these conversations with your team? Or even better, had your team already felt comfortable discussing these topics? How has it been going?
Now on to the link roundup.
Three Steps for Leaders to Take in Emergencies - Lara Hogan
For longtime readers, this won’t come as a surprise - it’s similar to advice that’s been given since the start of the pandemic, when people’s time and mental space was occupied by the then-new pandemic and new family care responsibilities.
The overall idea is to make as little hard demands on people’s time and mental energy as possible for things that can be done in other ways
And to take care of yourself:
Visualizing different one on one conversation types - Neer Sharma on Twitter
This is a good set of visualizations to keep in mind during any two-way conversation, whether it’s your one-on-ones with team members or otherwise. How does the conversation flow? Are one of you interviewing the other, or is there a more balanced interaction?
12 fully remote work observations as an engineering manager, three months in, in a company with distributed offices to start with - Gergely Orosz on Twitter
This - how even in a company with distributed offices, the challenges of moving to fully-distributed teams for a year - sounds a lot like the responses I got a couple of weeks ago about our experiences:
Time Machines & Leadership: 10 things I wish I knew at the start - David Boyne
Written for new managers, but it’s always worth revisiting the basics. David Boyne gives ten items he wishes he knew earlier, with 3-6 specific tips for each:
Stop Taking Regular Notes; Use a Zettelkasten Instead - Eugene Yan
The idea here is compelling but I’m not sure I have the discipline to make it stick.
The argument is that notes you keep are more valuable if they’re in a that allows linking between notes like Roam or Zettlr. I could totally see that being true. But would I really make the time to cross link the notes as they’re being written?
Has anyone tried a system like this? Is it really that much better than using Apple Notes or OneNote or Evernote and searching a lot?
Getting Down to Business in a University HPC Shop - HPCWire
A discussion with Brock Palen (of late lamented RCE-cast fame) at UMich about the completely-self-funded Great Lakes supercomputer.
A lot would have to change in our funding environment for this to be a widely-adopted model, but I’d really like to see it happen. Forcing more teams to be relentlessly focused on what researchers are willing to pay for - and to open researcher eyes to what stuff costs - can only be good for aligning research computing and research. Unfortunately, right now, this sort of model would devastate digital humanities work in Canada.
How you can help keep blogging alive and thriving - Marko Saric
Social media is around to stay, but with its downsides now clear, earlier web approaches to communications are coming back: blogging and small communities (then web forums, now slack communities).
You might want to have a personal blog or one one for your team or individual projects. I’m moving toward newsletters rather than my blog but I think the basic ideas are the same:
5 tips for effective customer support during a crisis - Amanda Cotter
We do a lot of customer support in research computing and many of the people we support are going through a lot - now and for the past months. Has your team seen more stressful interactions and seeming over-reaction by users and clients?
This is a set of hints for more traditional customer service roles but I think they apply to handling tickets or emails from researchers during tough times just as well:
7 practices you should follow for a successful code handover - Nicolas Carlo
Programming as Theory Building - Diogo Felix
These are interesting articles to read back to back.
Nicholas Carlo has his usual pragmatic information about legacy code - in this case, avoiding code becoming legacy code by executing a handoff between an outgoing developer and a new one. The key ones, I think, are:
The post by Diogo Felix is more theoretical/philisophical, based on a 1985 essay of the same name by Peter Naur. The argument is that programming creates code, of course, but that’s only one output, an incomplete representation of what was in the developer’s head. (Evidence: try revisiting someone else’s code.) The work of programming is theory-building (I think we’d say model building now). The developer(s) built a mental model of the problem and its solution.
Code becomes legacy, then, when the code is all that’s left. So its vital to transfer over the mental model as well as the code base when developers are leaving.
DBCore - Code generation powered by your database
I’m excited by the growth of “no-code”/“low-code”/code-generation tools for some pretty common tasks around data and services. How often as research computing teams do we need to stand up some services for remote access to databases of some sort? DBCore is a new one to me that generates code given a Postgres or MySQL database and schemas. The advantage of that is that the (golang) code can be modified and specialized to your needs. Similar tools but with less customizability (for better or worse) is PostgREST and the super cool read-onlyDatasette.
Chan Zuckerberg Initiative Awards $3.8 Million for Open Source Software Projects Essential to Science - Chan Zuckerberg Initiative
This funding of basic software for research computing is so important and I wish public sector agencies would follow the lead. CZI focusses largely on tools that support life science research, but that includes tools with much broader reach as well.
Managers — Look at Your Engineers’ PRs - Padmini Pyapali
New research software development manager are torn between staying engaged with the development while not micromanaging or slipping back into the comfortable role of individual contributor.
Watching, maybe even reviewing, PRs is a really nice balance. It keeps one on top of what’s going on, lets you see how people are doing, and even lets you give feedback on the code review process. This is a nice article arguing for the practice while guiding the reader past potential pitfalls.
Indiana University to Deploy Jetstream 2 Cloud with AMD, Nvidia Technology - Tiffany Trader, HPCWire
I don’t usually highlight new systems, but I’m excited about this one. You likely know I believe the research computing community has enormously over-invested in HPC-for-big-MPI-job infrastructure, and underinvested in high throughput or data intensive systems. During the current crisis with a shift to bio, a lot more time has been spent re-tuning batch systems on “big metal” systems to be decent high-throughput systems than on retrofitting high-throughput systems to support 100+ node jobs.
Jetstream has been one of the nicest multi-mode systems for what research computing is becoming, rather than what it was in the 90s, and I’m pleased to see a Jetstream 2.
Meeting reliability challenges with SRE principles - Cheryl Kang, SRE
A take on handling systems using Google’s SRE principles, focused on reducing three big sources of stress from running production software or systems:
All of the discussion here is completely relevant to research computing systems, for reasons well beyond reliability. Reducing toil and making monitoring meaningful are crucial for letting highly trained staff do more important work. And immature incident handling procedures —do I have to rehash the HPC blockchain-mining rant from a few weeks ago?
Nextflow Tower Launch - Paolo Di Tommaso, Sequera Labs
There will always be users who will have to ssh into compute nodes and write bash scripts to run their computational workloads. It’s crazy, though, that here in 2020 almost everyone has to do things this way. Most users run packaged software and need to only change configurations and inputs: or maybe create and run pipelines of such packages.
Nextflow Tower’s approach consists of having a DSL for running jobs, a library of such jobs and pipelines, and a nice GUI for starting such jobs. This may or may not be the way of the future for most users, but it’s surely closer to that future than “first, open a terminal…”
The 7th Annual Chapel Implementers and Users Workshop - Videos available
This event has come and gone, but the videos are now available online.
2020 International Workshop on Software Engineering for Computational Science - June 3-5, 2020, Discussion through June 12
An asynchronous conference - really interesting setup. Submitted papers and recorded talks, with Q&A via google docs. Some immediately relevant talks:
Linux Foundation Cloud Engineer Bootcamp - 24 weeks starting June 3, $599 USD
This is pricy and I’m skeptical of the utility of the certifications, but $599USD for ~15-20 hours a week of material for 6 months, going from basic Linux sysadmining to Kubernetes, may be useful for some in our community.
AWS Public Sector Summit Online - June 30, Free
The Summit is big enough to need an overview blog post. A lot of relevant topics.
So there’s now a pretty active community of live-coding twitch streamers. Live coding is great for teaching and training; heard of anyone doing things like this in research computing?
Malloc geiger counter - identify malloc/dealloc heavy phases of your code by listening to it.
An argument that functional programming had it wrong by focusing on the wrong abstraction, and that category theory was the way to go.
A DSL for beautiful programmatically generated mathematical diagrams.
So this new language people are talking about, Fortran, finally has an official webpage for the community.
Nice post on bfloat16, the 16 bit float with higher-range and lower-precision than IEEE’s fp16 to meet deep learning’s needs.
Fun and frustrating to watch big data communities learning things about floating point math - like Kahan sums - that the scientific computing community learns early on in their training. Frustrating because so many wheels are being re-discovered and re-invented because of scientific computing’s enthusiastic refusal to engage with these communities. I get the motivation behind “Not Invented Here”, but what’s with the “Invented Here, But You Folks Figure It Out Yourself”?
Deep dive into UEFI and how to write software to control the boot process.
And that’s it for another week.
Take care this weekend, and good luck in the coming week with your research computing team,