Sorry for the late issue. Between some stuff going on at my end and me still figuring out what a revamped newsletter will look like, this week’s newsletter is both a little late and a bit of a hybrid between where I’d like it to go and what it has been. (As always, suggestions for things you would like to see change to change or would like to stay the same are welcome - just hit reply or email me at [email protected])
The big news of the past week has been Apple’s new M1 CPU. Don’t worry, this newsletter is not going to degenerate into the cliched HPC/research computing blog writing solely and breathlessly about various new CPUs/network cards/SSDs and endlessly comparing speeds-and-feeds. The M1’s specs in and of themselves aren’t what’s interesting. Rather, the M1 is an example of how CPUs are going to get more different as time goes on, and that will have impacts on research computing teams. The M1 going to be a trial run for a future of more diverse computing architectures that we’d do well to get ready for.
Large-scale research computing systems have all been about “co-design” for ages, but the truth is that big-picture CPU design choices has been pretty fixed, with most of “co-design” being about choice of accelerators or mix and match between CPU, memory. and acceleration. Now that the market has accepted ARM as a platform - and with RISC-V on its way - we can expect to start seeing bolder choices for CPU design being shipped, with vendors making very different tradeoffs than have been made in the past. So whether or not you see yourself using Apple hardware in the future, M1’s choices and their consequences are interesting.
M1 makes two substantially different trade-offs. The first is having DRAM on socket. This sacrifices extensibility - you can’t just add memory - for significantly better memory performance and lower power consumption. Accurately moving bits back and forth between chips takes a surprising amount of energy, and doing it fast takes a lot of power! The results are striking:
LINPACK - solving a set of linear equations - is a pretty flawed benchmark, but it’s widely understood. The performance numbers here are pretty healthy for a chip with four big cores, but the efficiency numbers are startling. They’re not unprecedented except for the context; these wouldn’t be surprising numbers for a GPU, which also have DRAM-on-socket, and are similarly non-extensible. But they are absurdly high for something more general-purpose like a CPU.
Having unified on-socket memory between CPU and integrated GPU also makes possible some great Tensorflow performance, simultaneously speeds up and lowers power consumption for compiling code, and does weirdly well at running postgreSQL.
The second tradeoff has some more immediate effects for research computing teams. Apple, as is its wont, didn’t worry too much about backwards-looking compatibility, happily sacrificing that for future-looking capabilities. The new Rosetta (x86 emulation) seems to work seamlessly and is surprisingly performant. But if you want to take full advantage of the architecture of course you have to compile natively. And on the day of release, a lot of key tools and libraries didn’t just “automatically” work the way they seemed to when most people first started using other ARM chips. (Though that wasn’t magic either; the ecosystem had spent years slowly getting ready for adoption by the mainstream.)
“Freaking out” wouldn’t be too strong a way to describe the reaction in some corners; one user claimed that GATK would “never work” on Apple silicon (because a build script mistakenly assumed that an optional library that had Intel-specific optimizations would be present - they’re on it), and the absence of a free fortran compiler on the day of hardware release worried other people (there’s already experimental gfortran builds). Having come of computational science age in the 90s when new chips took months to get good tooling for, the depth of concern seemed a bit overwrought.
This isn’t to dismiss the amount of work that’s needed to get software stacks working on new systems. Between other ARM systems and M1, a lot of research software teams are going to have to put in a lot of time porting new low-level libraries and tools to the new architectures. Many teams that haven’t had to worry about this sort of thing before are going to have to refactor architecture-specific optimizations out and into libraries. Some code will simply have to be rewritten - some R code has depended on Intel-specific NaN handling to implement NA semantics (which are similar to but different from NaN) that M1 does not honour, so natively compiled R needs extra checks on M1.
It’s also not to dismiss the complexity that people designing and running computing systems will have to face. Fifteen years ago, the constraints on a big computing system made things pretty clear - a whackload of x86 with some suitably fast (for your application) network; the main question were how fat are the nodes and what’s the mix of low, medium, and high-memory nodes. It’s been more complex for a while with accelerators, and now with entirely different processor architectures in the mix, it will get harder. Increasingly, there is no “best” system; a system has to be tuned to favour some specific workloads. And that necessarily means disfavouring others, which centres have been loathe to do.
So the point here isn’t M1. Is M1 a good choice for your research computing support needs? Almost certainly not if you run on clusters. And if you’re good with your laptop or desktop, well, then lots of processors will work well enough - but a lot of software is going to now have to support these new systems.
And CPUs will keep coming that will make radically different tradeoffs than choices than seemed obvious before. That’s going to make things harder for research software and research computing systems teams for a while. A lot of “all the world’s an x86” assumptions - some that are so ingrained they are currently hard to see - are going to get upended, and setting things back right is going to take work. The end result will be more flexible and capable code, build systems, and better-targeted systems, but it’ll take a lot of work to get there. If you haven’t already started using build and deployment workflows and processes that can handle supporting multiple architectures, now is a good time to start.
But the new architectures, wider range of capabilities, and different tradeoff frontiers are also going to expand the realm of what’s possible for research computing. And isn’t that why we got into this field?
With that, on to the (shorter) roundup…
8 Steps to Creating a Virtual Employee Onboarding Program - Bruce Anderson
Preserving Culture When Someone Leaves the Team - Mark Wood
Whether people are leaving or joining, as managers we have to be deliberate about the team we’re helping build. Wood points out that when someone leaves, you have the opportunity and responsibility to think about what behaviours that person had that shaped the team, and what you’ll do to replace those - asking other people (maybe including yourself) to do some of some of those things to fill in the gap, hiring someone who will do similar things. You also have the opportunity to think about what behaviours make sense to end or change and take steps there, too.
When onboarding, new employees are no longer immersed in the team’s culture they were when you were all colocated, so you have to take active steps to make sure they understand how your team works and get up to speed quickly. Make sure there’s quick wins lined up for them, communicate endlessly about the things that are important, make sure you’re building horizontal communications between team members not just vertical communications between you and them, and make sure you (and other team members) are modelling the behaviour you expect to see from them.
Renegotiating your first vendor contract - Will Larson
Eventually we all have to negotiate or renegotiate our first contract with vendors. This short article won’t give any secrets that will win you huge concessions. It will hopefully make the process less stressful by providing a game plan and by setting some realistic expectations, on their side (your vendor wants money, and is willing to make some small trades on the margins for the exact amount of money, but not much) and yours (you should absolutely be able to assume decreasing marginal costs, reasonable timelines, and to see issues seen previously addressed).
Organising Large Miro Boards For Remote Workshops - Nick Tune
The Miro Sprint Planning Playbook - Miro
Since no longer being colocated with our team members and research communities is going to be the norm now, we have to continue to improve how we run events that used to be highly interactive and based on whiteboards or flip charts or the like.
Miro seems like a perfectly good distributed whiteboard application - there are others which also seem perfectly good, but Miro is the leader at the moment. Here are two concrete sets of recommendations for using Miro for long multi-day workshops (by Tune), and one for running sprints and retros (by Miro themselves).
There are other tools which are good for more specific applications - for strategic planning type workshops I’ve had good luck recently with Axis to generate prioritized service catalogues and do things like audience-sourced SWOT analyses.
From Sysadmin to SRE - Josh Duffney, Octopus Deploy
As research computing becomes more complex, our systems teams are going to have more and more demands on them, moving them from sysadmins to systems reliability responsibilities, and working more closely with software development teams. It’s an easier transition for sysadmins in research computing than in most fields, as our teams generally have pretty deep experience on the software side of research computing too.
Duffney’s article lays out how to start thinking about these changes to responsibilities and what people can start doing today to move in that direction. The key thing - it’s not about tools, its about how to think about your role, seeing rough spots in the build/deploy/operate cycle, and working to improve them.
Supercomputing 2020—New MPI heights, joining the Graph500, and 1 TB/s filesystems - Evan Burness, Microsoft Azure Blog
Azure is increasingly agressively going after research computing and HPC, and announced several quite cool things at SC2020:
Mersenne twisters aren’t great random number generators.
Resources for teaching data engineering.
An opinionated list of CLI utilities for monitoring and inspecting Linux/BSD systems.
A growing list of training material for new research software developers.
A jumphost-only ssh server, lazyssh.
An argument for teaching people to program starting with testing.
Interesting book (and pointers to other resources) on career trajectories in and perspectives on software architecture. Along the same lines, here’s an architecture playbook.
AWS’s automatic S3 access-frequency tiering is getting deeper and smarter.