#79 - 18 June 2021

One-on-one vents; the great job migration; always be quitting; moving a 400 person hackathon online; glue as the dark matter of software; security secrets

Hi!

I’m back and refreshed from my mini-break from the newsletter; there’s a lot of optimism in the air here as COVID-19 numbers continue to drop and projects are moving forwards.

I don’t have a lot else to add right now, so let’s go straight to the roundup!

Managing Teams

When a 1:1 turns into a vent session - Yihwan Kim
One-on-one 101 - Tiffany Longworth, Clickety
The Update, The Vent, and The Disaster - Michael Lopp

A couple good articles on one-on-ones from a a couple of weeks ago.

In the first, Kim offers five pieces of advice around vents. In particular, vents aren’t about problem solving, they’re about letting something out - and that something has been festering for a while. So Kim’s advice is:

  • Don’t rationalize, and definitely don’t interrupt
  • Don’t assume — ask
  • Don’t take a stance (you’re only hearing one side of things), but don’t be afraid to have opinions
  • Try to end on a positive note
  • See if you need to follow-up

In particular sometimes just letting something out is enough - but sometimes it’s raising something that needs to be addressed in future discussions.

In Longworth’s article, she talks about her formula for one-on-ones:

  • Her first question is always “What’s the most important thing we talk about today?”
  • After having the team member lead the discussion for a while, she addresses follow ups from previous meetings/things that came up this week.
  • Asks about the team member’s wins for week, and worries
  • Gives any org updates - this can be really tricky to do in one-on-ones and in my opinion is best handled in staff meetings
  • Any time at the end gets filled with Weird Manager Questions; she asks if she can ask a weird manager question and then asks one of these:
    • Think about how you spend your time at work. Divide this sticky note/whiteboard square/drawing doc into the different things you do. Are these the right amounts of time you should/want to be spending on each of these? What would an ideal distribution of your time look like?”
    • “What ways can we show you we appreciate what you do?”
    • “Using Paloma Medina’s BICEPS framework, how are you doing across each of your core needs?” I like to use a scale model and track this over time. You can use something like “On a scale of 1 to 5, where 1 is ‘I feel awful’ and 5 is ‘happy as a clam,’ where are you on Belonging?” or something more tactile like “Here’s an empty battery outline - how ‘full’ or ‘empty’ are you feeling on Predictability?”

The last article is an old one but includes elements of both Kim and Longworth’s articles. In particular, Lopp considers a one-on-one that is just “an update” to be a bit of a miss, and shares his own approach for dealing with those - bringing up three particular points, doing a mini performance review on some specific area, or taking the time to get input on some fire he’s fighting. For the vent or the disaster Lopp gives similar advice to Kim - and suggests that one purpose of one-on-ones is to provide an environment where some of those minor dissatisfactions can be surfaced before they become vents or disasters.


Robert Half Research Points To Strong Job Optimism Among U.S. Workers - Robert Half Press Release

A recruiting company (so take with a grain of salt) did an online survey

The online survey was developed by Robert Half and conducted by an independent research firm from March 26 to April 15, 2021. It includes responses from more than 2,800 workers 18 years of age or older at companies in 28 major U.S. cities with 20 or more employees.

and they found that 32% were planning to start a job search in the next few months. There’s a lot of natural attrition that’s not happened over the last year and a half due to people clinging to familiarity in a time of uncertainty, and now as things open a bit up people are tired and looking for a change. Salary and not enough job advancement were tied for top reason why people were considering looking for new work.

The survey also said that and 47% wanted fully remote work.


There’s an interesting discussion of effective messaging for candidates for those doing active recruiting - on LinkedIn in this case, but it could just as well be over email or somewhere else. The advice is pretty straightforward but easy to forget - talk to them like an individual human - with some concrete suggestions about when and how to send the message.

Are other groups in research computing doing active recruiting, or soliciting recommendations? I don’t hear of a lot of groups doing it.


Managing Your Own Career

Always be quitting - Julio Merino

If we knew we were quitting (or just going on a long vacation) in two months, what would we be doing differently at work? Probably documenting a lot more, making sure people were coming to meetings so that they could take our place when we weren’t there, training up people to be able to take over parts of our role for us.

But those activities are key and routine parts of being an effective manager or technical leader. Merino commends this approach to us - to be imagining that we were leaving, and being in the mindset of making ourselves less necessary. It’s a vital approach to scaling up our team and making room to take on new challenges, even in the same role.

In particular, Merino suggests -

  • Document your knowledge, long-term plans, and meetings
  • Bring others to meetings
  • Train people around you
  • Identify and train your replacement(s)
  • Give power to the people
  • Don’t make yourself the point of contact - establish mailing list or other forms of communication
  • Delegate
  • Always be learning

These are great points to be consistently working on, whether you’re a manager, an individual contributor with a technical leadership role, or even just intending to take on either of those roles.


Product Management and Working with Research Communities

How to move a 400-person hackathon online - Juri Chomé

Even post pandemic, a lot of people are going to be more willing to do virtual events than they were in 2019, which opens up a lot of collaborative opportunities. Here Chomé walks through how they moved ZuriHac, a Haskell hackathon held in June ever year, to purely online. They talked about the tools they used (Discord for chat, with one use of Zoom, Streamyard + Youtube for streaming presentations, repl.it for some live coding, and an existing registration system) and how they actually ran things. With ~50 projects, the remote collaboration within projects wasn’t hard, but coordinating all of the activities across projects and the hackathon-wide activities took a lot of doing.


Resource and Career Center Pilot for Advancing Computational and Data-intensive Research Earns $1.49M NSF Award - press release, Internet2

This looks to be a promising development for the development of research computing and data as a profession. Since #21 I’ve been talking about the CaRCC’s Research Computing and Data Capabilities Model, a still-underused tool for thinking about the variety of services needed for providing RCD support to researchers in academic institutions. Some of the same group is going to turn their mind to RCD staff training and career pathways. From the press release:

Among the goals of the pilot program is to develop resources for staff training and workforce development, including leading practices for recruitment, onboarding, advancing diversity, equity and inclusion, professional development, and proven models for student internship and training programs. […] In addition to training and workforce development, the Resource and Career Center pilot will create and share a model of career arcs for RCD professionals to explain career options and help existing RCD professionals explore professional development and advancement opportunities.


Research Software Development

Glue: the Dark Matter of Software - Marcel Weiher

Weiher hypothesizes that one reason that code sizes of systems are exploding even as individual reusable components are plentiful is that we don’t have any real sensible way to deal with glue code. There are DSLs for algorithms but nothing similar for the plumbing code that makes the output of one component useful as the input for another; and what’s more, the number of those connections is quadratic in the number of components.


Clever vs Insightful Code - Hillel Wayne

Wayne distinguishes between “clever” code, which exploits arcane knowledge of the programming language/os/library, with “insightful” code, which exploits knowledge of the problem. Both can make code brittle - changes may make the cleverness/insight irrelevant - but documenting the insight you’re exploiting is a good and useful thing, and puts the focus where it should be, on the problem.


I tried out VSCode Remote Repositories this past week (including for the newsletter website) and I have to say it was really slick. Fire up vscode, open a GitHub repo, and browse code or even create and develop on a branch right there. If you want to run tests etc you’ll want to use code spaces, but for code browsing or simple development this is really slick.


Research Computing Systems

(Technical) Infosec Core Competencies - Jan Schaumann
Linux Privilege Escalation: Exploiting Capabilities - Stefano Lanaro
Kerckhoffs’s Law for Security Engineers - Devdatta Akhawe
How to Handle Secrets on the Command Line - Carl Tashian

Good security is good operations.

Schaumann goes through a very approachable what he views as core competencies for information security practitioners, and crucially most of them are not deep and profound understanding of obscure corners of the kernel or cryptographic implementations, but being able to read and learn from relevant news sources and being a careful and deliberate system administrator:

You may also notice that a lot of this overlaps with a general understanding of… well, computering on the internet, with operations and system administration concepts. This is no coincidence. Good ops is good security.

Relatedly, Lanaro walks us through exploiting linux capabilities - not by plunging deep into the linux kernel, but by treating it correctly as a finer-grained but similar issue to exploting suid executables, looking for files that have capabilities they shouldn’t have and using them.

Akhawe talks about Kerchoff’s law - no “security via obscurity”, rely on no secrets other than the key - and why it matters; you can change the key easily. Except, unless you actually are changing the key from time to time, you don’t actually know for sure that you can change the key easily. Build in key rotation from the beginning to make sure that a breach of the key can be readily mitigated.

And finally, Tashian goes through the steps necessary to make sure secrets provided on the command line to aren’t available to anyone who happens to “ps” at the right time.

All of these have a similar theme - good security is good operations, and being excellent at security doesn’t mean wizard-level knowledge of hardware and kernels but by broad knowledge of a lot of tools, and using them wisely.


Unreliability At Scale - David Rosenthal
Silent Data Corruptions at Scale - Dixit et al., arXiv:2102.11245
Cores that don’t count - Hochschild et al., HotOS ’21

CPUs with bugs aren’t new - some of us will remember the Pentium FDIV bug - but with increasing complexity (and smaller components) hyperscalers are starting to see bugs on individual cores of individual machines that are extremely difficult to detect much less debug.

Rosenthal provides a nice summary of two recent papers, by Facebook (Dixit et al.) and Google (Hochschild et al.) Both tell wonderful debugging stories - Facebook discovering files were being dropped by a pipeline and it all came down to a specific mathematical operation being repeatably mis-performed on core 59 on a particular box - they nailed it down to a 60-line assembly reproducer. Google found silent data corruption caused by a “few mercurial cores per several thousand machines” that included (from Rosenthal’s summary):

  • Violations of lock semantics leading to application data corruption and crashes.
  • Data corruptions exhibited by various load, store, vector, and coherence operations.
  • A deterministic AES mis-computation, which was “self-inverting”: encrypting and decrypting on the same core yielded the identity function, but decryption elsewhere yielded gibberish.
  • Corruption affecting garbage collection, in a storage system, causing live data to be lost.
  • Database index corruption leading to some queries, depending on which replica (core) serves them, being non-deterministically corrupted.
  • Repeated bit-flips in strings, at a particular bit position (which stuck out as unlikely to be coding bugs).
  • Corruption of kernel state resulting in process and kernel crashes and application malfunctions.

Emerging Technologies and Practices

Migrate Your Workloads with the Graviton Challenge! - Steve Roberts, AWS News Blog

In #74 we mentioned an AWS-sponsored hackathon for the porting codes for the graviton2s; for those who are interested in the work but are too busy to block off time for a hackathon, AWS is also supporting a graviton challenge, with all valid entries getting a $500 AWS credit, and prizes for best adoptions. Between 22 June and 31 Aug, perform eight steps over any four, non-consecutive, days, and record a short video about your results and you’re in.


Calls for Submissions

A tonne of calls for submissions for events to be held as part of SC21 in November in St Louis, MO:


EMCLPKDD Workshop on Automating Data Science (ADS2021) - 17 Sept, Virtual, 6 page papers due 23 June

Topics include

  • Automating data wrangling
  • Data integration via AI techniques (e.g., NLP)
  • Merging the preparation of data into the statistical learning
  • Handling missing and anomalous values semi-automatically
  • Using NLP for generating explanations and reports.
  • Incorporating domain knowledge into the automation of data science.
  • Semi-automating visualization
  • Semi-automated machine learning
  • Learning with non-normalized data
  • Impact of data science automation on the work of data scientists

Re-envisioning Extreme-Scale I/O for Emerging Hybrid HPC Workloads (REX-IO ’21) - 7 Sept, Virtual, 6 page papers due 2 July

Topics include

  • Understanding I/O inefficiencies in emerging workloads such as complex multi-step workflows, in-situ analysis, AI, and data analytics methods
  • New I/O optimization techniques, including how ML and AI algorithms might be adapted for intelligent load balancing and I/O pattern prediction of complex, hybrid application workloads
  • Performance benchmarking, resource management, and I/O behavior studies of emerging workloads
  • New possibilities for the I/O optimization of emerging application workloads and their I/O subsystems
  • Efficient tools for the monitoring of metadata and storage hardware statistics at runtime, dynamic storage resource management, and I/O load balancing
  • Parallel file systems, metadata management, and complex data management

SeptembRSE, the Fifth Conference of Research Software Engineers - 6-30 Sept Virtual, Submissions (Talks, posters, walkthroughs, workshops, panels, discussions) due 2 July

Covering anything of interest to research software development.


18th Annual IFIP International Conference on Network and Parallel Computing (IFIP NPC) - 3-5 Nov 2021, Paris, 12-page papers due 11 July

A wide conference covering topics on parallel and distributed applications and algorithms, architectures and systems, and distributed software environments and tools.


OpenMPCon - Virtual event 3 Sept, Submissions (presentations, posters, tutorials, BoF, panel discussions) due 5 July

Topics of interest include

  • The use of OpenMP in any scientific or high-performance computing (HPC) application domain
  • Development tools, including compilers, debuggers and profilers and reference implementations
  • HPC frameworks and libraries developed with OpenMP
  • Performance and portability, including benchmarking
  • Proposed OpenMP extensions
  • Using OpenMP on accelerators and GPUs
  • Best practices for using OpenMP, including Gotchas and FAQ’s
  • Comparisons with other parallel programming languages

Events: Conferences, Training

ACM HPDC 2021 - The 30th International Symposium on High-Performance Parallel and Distributed Computing, 21-25 June Online, $50-$100

Little short notice for this one, sorry, but next week there’s a series of talks on a pretty wide range of parallel and distributed computing topics - program is here.


Nordic-RSE online unconference 2021 - 29-30 June, 13h-16h CET, Free(?) registration

Not limited to those in the nordics (or even those who consider themselves RSEs). Helpfully, the conference provides shorter and longer teasers for the event (note to organizers! This stuff makes it much easier for people like me to include your event in roundups!)

Are you developing software or tools that are driven by research/engineering in either academia or industry? Need to network, share knowledge and experiences with your peers? Maybe you have heard of something called research software engineers and you would like to know more? Nordic-RSE invites everyone interested in such topics to join our online unconference (lightweight get-together) on June 29 and 30, where we let the participants shape the agenda (“birds of a feather”). To kick off the event, we also have four invited speakers, including:

Kristoffer Carlsson (JuliaComputing) “Julia for research software engineers”, Athanasia Monika Mowinckel (University of Oslo) “Developing and distributing in-house R-packages”, Shahnawaz Ahmed (Wallenberg Centre for Quantum Technology) “Keep your code alive - lessons from the QuTiP project”.


Random

In further “cryptocurrencies making everything worse” news, Dockerhub is now cancelling autobuilds for free accounts. I’m sort of dreading GitHub Actions going the same way. We’ve signed up for an open source docker hub teams account but the line is so long we’ve been waiting to hear back for weeks.

A simple handmade CP/M or DOS linker for teaching about how linkers (and the old .COM files) work.

Modularize your bashrc with a .config/bashrc.d directory.

A tutorial on memoizing slow bash script operations that are used repeatedly in a long script - using bash4 coprocesses.

Bash quoting is subtle, and then with ssh commands another layer of quoting and parsing is added. Here’s a walkthrough of quoting bash commands called over ssh.

VSCode now lets you directly work with remote github repos without downloading them; I tried them myself this week. This along side Codespaces is probably going to change my development workflow quite a bit.

A walk through on using extremely cheap google compute platform resources to run an app - the example is an Elixir app and there’s some stuff in there that’s elixir specific to optimize for the constrained resources but most of it is pretty general.

There’ll be a twitter chat on the extreme-scale scientific software stack on June 21st - interesting to see the attempt to use different media for communication and outreach.

“Modulinos” - files that behave as a library when imported, or a script when executed - in bash.

asciiflow for drawing ascii diagrams, and exporting them (including prepending them with typical comment characters for use in code).

In defence of swap.