#34 - Link Roundup, 24 July 2020

Hi everyone -

I hope it’s been a good week for you. On our side we’re setting up our first Jira sprint on a project I’m really excited about — and we have a really great PM helping us, shoring up one area where I personally am not great. Between the team, the project, and getting some help for a managerial area I’m weak in have me really eager and optimistic about how this is all going to go.

In terms of the newsletters, I’m gearing up to try the Q&A and the interviews that wee suggested.

For questions, the first version of the Research Computing Teams AMA (Ask Managers Anything) is up here. Please post (they’re moderated for spam) and upvote questions! Top questions will get posed to the community every week with results summarized.

Interviews will take a little longer: I’m starting to line up interviewees. If you have people or projects you’d like to hear from, or are willing to volunteer, let me know (just reply to this email). All suggestions warmly welcomed.

As always, let me know what you find interesting or useful; just hit reply to any newsletter email and the response will just go to me - or feel free to email jonathan@dursi.ca.

And on to the link roundup:

Managing Teams

Microsoft Analyzed Data on It’s Newly Remote Workforce - Natalie Singer-Velush, Kevin Sherman, Erik Anderson, Microsoft, writing at HBR

A lot of big companies have pretty decent datasets on how their teams work and how that got disrupted by suddenly working distributed during a pandemic. I find what they’re discovering fascinating, even if one has to be careful about trying to generalize.

Microsoft has a 350-person “Modern Workplace Transformation” team that has consented to have their MS Teams IMs, emails, calendars, videoconference lengths, etc. analyzed - their job is to figure out how people work in an increasingly digital age. So that team has very fine-grained data on how those teams have been coping.

The article is long enough that I won’t try to summarize it here. Some key relevant bits:

Employees who had the most one-on-one communication with their manager ended up having the least increase in total number of hours working. I can imagine lots of reasons for that - clarity of expectations, more feedback (in both directions) so less unnecessary work, etc. Folks, do your one-on-ones.
Managers are bearing the brunt of the transition to remote work, with more hours worked to do the necessary coordination and glue work and take care of our team members. This is good and proper, but we have to look after ourselves too, y’all.
Managers are collaborating more with peers. I’ve found that too - a lot of silo walls seem to be more porous now.
It didn’t take long to settle into a new normal. That’s good and bad; the disruption settled down quickly, but I think after 3-4 months of this we know better than after the first 1-2 weeks what works well and what doesn’t; I think a lot of our teams could use another shakeup to reset some things.
Lots of IMs between 6pm and midnight, and work bleeding into the weekend. Maybe some of those are more social but it’s hard to see that as anything other than bad.
Meetings got more numerous but shorter.

Managing for Neurodiversity - Anjuan Simmons

This is a short and useful discussion from an experienced tech manager about managing team members who are expressing behaviours that might suggest neurodivergence:

They simply receive information about the world and process it in different ways. In fact, no two people see and respond to the world in the same way. We all need to make accommodations for these differences whether we’re talking about introversion, extraversion, autism, or dyslexia.

The thing I like about this article is it strongly counsels against any kind of armchair diagnosing - neither you nor I have the skills for that - and focuses on three common categories of behaviours, with concrete remediations for each.

Short Attention Span
Distractibility
Hyperfocussing

As Simmons points out, being ready with these remediations in your managerial toolkit makes the work environment better for everyone. Honestly, with everything going on - the pandemic, extended work from home, racial injustices, and police backlash - who doesn’t have some occasional short attention span and distractibility? And which of us have no team members (or selves) that have gotten sucked into a rabbit-hole which has them hyper focussed on the wrong something?

Don’t Create Chaos - Stay SaaSy
How to Lead Decisively when you Don’t Know What’s Next - Karin Hurt and David Dye, Let’s Grow Leaders
Making Decisions with Others - Deepak Azad

“Great leaders vacuum up chaos” - The Stay SaaSy post uses this as a nice way to describe one really important function of managers. We have to be entropy-fighters, reducing chaos and uncertainty about how a project will go forward, what priorities should be, what are good next steps for a team members career, and any of a number of other things. And one key point they make:

The fancier your title, the more you must avoid causing chaos.

This is something that a lot of new managers (including myself at the time) don’t get. Once you’re a manager, you can accidentally sow chaos by musing aloud about some idea that just crossed your mind, or asking lots of questions about stuff that doesn’t matter but you’re just personally curious about.

All that chaos reduction means making decisions, especially in the face of uncertainty. That’s really hard for some of us in research. I probably like many of us, was trained in academia. My wife was trained in the Emergency Room. One of us is much better at decisive decision making! The other of us prefers to thoroughly and leisurely analyze things, maybe read a couple books first. No points for guessing who is who.

The short Hurt and Dye article urges us to lean into uncertainty in decision making, by accepting the uncertainty and acting anyway:

Ground yourself in your values
Stay focused on what matters most.
Make the best, next, small, bold decision.
Show up with confident humility, and
Preparing for the pivot.

The article by Azad counsels us to have clarity about your decision making process, especially decisions large enough to have to think about RACI:

What decision needs to be made?
What’s the timeline of the decision making?
Who will do the work and arrive at a decision?
Who are the stakeholders?
Who will ratify or veto the decision?
Who will need to be informed of the decision?
When can you revisit a decision?

Even for smaller decisions, being clear with yourself about what decision needs to be made now, setting a timeline for the decision making, understanding the consequences are of making the decision, and being clear on how easy or hard it is to revisit the decision later has made the decision making process a lot easier for me. Now it’s like one book, tops.

Managing Your Own Career

Why Asking for Advice Is More Effective Than Asking for Feedback - Jaewon Yoon , Hayley Blunden, Ariella Kristal and Ashley Whillans

A nice older article that crossed my desk again arguing that you’ll get more open and useful input from a broader range of people people by asking for “advice” than “feedback”. As always, you don’t need to follow every piece of advice you get, but you should at least take the advice seriously enough to consider.

How I Offloaded My Anxiety To Trello - Cate

A manager’s story about using Trello to manage work and life tasks to feel more organized and less stressed.

I haven’t adopted everything from Getting Things Done, but two things that I absolutely swear by that the author finds useful with the system described here:

Get tasks out of your head and onto paper/todo app/whatever as soon as possible
Keep all such things in one place - not a “life” and “work” one.

Myself, I’ve got a paper journal for the day and am using Omnifocus as the “data lake” of todo tasks. It doesn’t quite work, something like Trello could easily be better.

Product Management and Working with Research Communities

Collaborating on Research Data Support - Christina Maimone

This is a short and useful “what worked well/what was challenging” overview of three initiatives at Northwestern where Research Computing and the Libraries collaborated on research data support. Both entities have a lot of experience and a lot of resources around research data management, and have greater or lesser amounts of reach with different parts of the University community.

Even though your research computing team and your library may be quite different, I think there’ll be a lot of commonalities here if you’re thinking of trying similar initiatives. The Library has a different natural constituency then Research IT, different means of communicating, acts on different timescales, and has different institutional priorities. But that diversity of audiences, needs, expertises, and strengths can be an advantage if you can figure out how to split responsibilities in a way that meets everyone’s needs. It sounds like Northwestern was on their way to some success with this, although the pandemic has interrupted this work (a lot of the support efforts were built around in-person events.)

Wiki.js

A lot of us need to put up documentation or knowledge-sharing websites. My last mediawiki install did… not go great, so this Javascript-based, pretty modern and lightweight wiki with support for a lot of authentication mechanisms caught my eye.

Research Software Development

Post-Commit Reviews - Cindy Sridharan

A thought-provoking post by a very experienced and successful software development manager, advocating for teams to at least consider post-commit (but pre-deploy) code reviews once they have a sufficiently mature CI/CD environment. Along the way, a nice discussion of the role of code reviews (not just for bug-finding) and CI/CD. I don’t know yet how I feel about the recommendation but it’s certainly worth reading.

Software should be designed to last - Alfonso de la Rocha

The arguments here apply particularly to research software, but there’s a real tradeoff here too.

If we’re serious about writing research software that’s reusable, we need to take into account how research software gets reused. And the fact is that a lot of research software often lies fallow for a while, used for some project and then pretty much quiescent until a (say) postdoc picks it up for another, only somewhat related project. But a good chunk of the time, it doesn’t work any more.

So it’s important to find a way to routinely write research software that can survive a while without being actively maintained by the original project. It’s not enough that research software be sustainable, it needs to be somewhat less biodegradable than it usually is. That way it can be in decent enough shape that our poor postdoc can successfully run it again. (And no, container images or static binaries aren’t a panacea here).

The argument advanced here is mainly about dependencies - whittling them down to a small number - to reduce the cross-section for breakage by a dependency. That’s probably part of the answer. I think another is automated tooling to check for dependency updates. But I worry that a more fundamental one comes down to choosing frameworks and languages that are almost certainly going to be around in ten years, and that means forgoing a lot of new and very helpful (and productive) tools.

What do you think, are there options I’m missing for building research software that lasts?

The Data Science Lifecycle Process - Charles Morris & William Buchanan

This is a template GitHub repo for data science projects. More than just a list of recommended directories for their use, it’s a set of recommendations for the lifecycle of a data science project, which could be very useful for a lot of research work. There’s a pre-configured set of issue templates and labels; a well-thought out branching strategy including

Collaboration Branches
Feature and Issue Branches
Data Branches
Explore Branches
Experiment Branches, and
Model Branches

as well as guidance on artifact management and deployment and operations. I

t’s very well thought out and worth looking through if your team helps perform data analysis projects, even if you don’t plan to use the template. Software development and data science projects have a lot to teach each other, and research computing is really where they come together, so efforts to combine the two catch my eye. Another one that crossed my desk this week is SplitGraph another “git for data” effort.

Emerging Data & Infrastructure Tools

The Road to Kata Containers 2.0 - Horace Li, The New Stack

Nice overview on the history and near future of Kata Containers, an open source project for containers that are a little closer to VMs (they have their own instances of the kernel) and make use of hardware support in the CPUs for isolation. It’s no longer an Intel-only project but supports AMD, ARM, and IBM Power (and IBM z-series, if you’re into that).

1.1 Billion Taxi Rides using OmniSciDB and a MacBook Pro - Mark Litwintschik

While Postgres and MySQL are ubiquitous, a lot of researcher use cases come down to being able to do fast and varied analyses of the data. That’s often best handled by columnar databases, and the number and scope of open source columnar databases are growing remarkably - there’s way more than back in the day of MonetDB. OmniSciDB (neé MapD) is new to me but looks like it might be worth playing with.

Events: Conferences, Training

34th VI-HPS Tuning Workshop - 28 - 30 July, 9am - 5pm BST, Free

A well regarded workshop on HPC Tuning tools. The hands-on sessions will be on Archer2 but the content should be useful more broadly.

Engineering Management 101 - Anytime, $150 USD

The folks at Developers First have put together a 4-5 hour self-guided course for new tech managers.

Random

Singer lets you convert data between record-based file formats by defining an intermediate representation. So for instance if you write a ‘tap’ to read data in out of an API, it becomes trivial to output it to database tables, csvs, or to google sheets, where targets are already written. Hmm.

An old debugging warstory - email wouldn’t go more than 500 miles.

An update from twitter about their security hack, posted the day after the event. Researchers deserve at least as much transparency about the systems and software they use for science as do the users of a free social media site.

More for the “files are bad, actually” folder. Can applications recover from fsync() failures? Spoiler - mostly no.

Azure has followed AWS with beginning releasing a series of “Well-Architected Framework” documents. Having a set of reference architectures and guidelines makes it much easier for teams to get started.

Some testimonials about using Basecamp’s ShapeUp process rather than traditional Agile sprints. I think for large enough research computing teams this is a promising approach.

Google’s released a really cool autodifferentiation package for python called jax.

A random forest model for choosing the right cloud provider and instance type for a HPC-type application.

A module system for bash scripts(!!)

A link shorting service implemented entirely in Github Pages and Github Actions.

Managing your dotfiles with .git

A templating system for JSON, YAML, or ini config files.