Hi everyone -
I hope it’s been a good week for you. On our side we’re setting up our first Jira sprint on a project I’m really excited about — and we have a really great PM helping us, shoring up one area where I personally am not great. Between the team, the project, and getting some help for a managerial area I’m weak in have me really eager and optimistic about how this is all going to go.
In terms of the newsletters, I’m gearing up to try the Q&A and the interviews that wee suggested.
For questions, the first version of the Research Computing Teams AMA (Ask Managers Anything) is up here. Please post (they’re moderated for spam) and upvote questions! Top questions will get posed to the community every week with results summarized.
Interviews will take a little longer: I’m starting to line up interviewees. If you have people or projects you’d like to hear from, or are willing to volunteer, let me know (just reply to this email). All suggestions warmly welcomed.
As always, let me know what you find interesting or useful; just hit reply to any newsletter email and the response will just go to me - or feel free to email firstname.lastname@example.org.
And on to the link roundup:
Microsoft Analyzed Data on It’s Newly Remote Workforce - Natalie Singer-Velush, Kevin Sherman, Erik Anderson, Microsoft, writing at HBR
A lot of big companies have pretty decent datasets on how their teams work and how that got disrupted by suddenly working distributed during a pandemic. I find what they’re discovering fascinating, even if one has to be careful about trying to generalize.
Microsoft has a 350-person “Modern Workplace Transformation” team that has consented to have their MS Teams IMs, emails, calendars, videoconference lengths, etc. analyzed - their job is to figure out how people work in an increasingly digital age. So that team has very fine-grained data on how those teams have been coping.
The article is long enough that I won’t try to summarize it here. Some key relevant bits:
Managing for Neurodiversity - Anjuan Simmons
This is a short and useful discussion from an experienced tech manager about managing team members who are expressing behaviours that might suggest neurodivergence:
They simply receive information about the world and process it in different ways. In fact, no two people see and respond to the world in the same way. We all need to make accommodations for these differences whether we’re talking about introversion, extraversion, autism, or dyslexia.
The thing I like about this article is it strongly counsels against any kind of armchair diagnosing - neither you nor I have the skills for that - and focuses on three common categories of behaviours, with concrete remediations for each.
As Simmons points out, being ready with these remediations in your managerial toolkit makes the work environment better for everyone. Honestly, with everything going on - the pandemic, extended work from home, racial injustices, and police backlash - who doesn’t have some occasional short attention span and distractibility? And which of us have no team members (or selves) that have gotten sucked into a rabbit-hole which has them hyper focussed on the wrong something?
Don’t Create Chaos - Stay SaaSy
How to Lead Decisively when you Don’t Know What’s Next - Karin Hurt and David Dye, Let’s Grow Leaders
Making Decisions with Others - Deepak Azad
“Great leaders vacuum up chaos” - The Stay SaaSy post uses this as a nice way to describe one really important function of managers. We have to be entropy-fighters, reducing chaos and uncertainty about how a project will go forward, what priorities should be, what are good next steps for a team members career, and any of a number of other things. And one key point they make:
The fancier your title, the more you must avoid causing chaos.
This is something that a lot of new managers (including myself at the time) don’t get. Once you’re a manager, you can accidentally sow chaos by musing aloud about some idea that just crossed your mind, or asking lots of questions about stuff that doesn’t matter but you’re just personally curious about.
All that chaos reduction means making decisions, especially in the face of uncertainty. That’s really hard for some of us in research. I probably like many of us, was trained in academia. My wife was trained in the Emergency Room. One of us is much better at decisive decision making! The other of us prefers to thoroughly and leisurely analyze things, maybe read a couple books first. No points for guessing who is who.
The short Hurt and Dye article urges us to lean into uncertainty in decision making, by accepting the uncertainty and acting anyway:
The article by Azad counsels us to have clarity about your decision making process, especially decisions large enough to have to think about RACI:
Even for smaller decisions, being clear with yourself about what decision needs to be made now, setting a timeline for the decision making, understanding the consequences are of making the decision, and being clear on how easy or hard it is to revisit the decision later has made the decision making process a lot easier for me. Now it’s like one book, tops.
Why Asking for Advice Is More Effective Than Asking for Feedback - Jaewon Yoon , Hayley Blunden, Ariella Kristal and Ashley Whillans
A nice older article that crossed my desk again arguing that you’ll get more open and useful input from a broader range of people people by asking for “advice” than “feedback”. As always, you don’t need to follow every piece of advice you get, but you should at least take the advice seriously enough to consider.
A manager’s story about using Trello to manage work and life tasks to feel more organized and less stressed.
I haven’t adopted everything from Getting Things Done, but two things that I absolutely swear by that the author finds useful with the system described here:
Myself, I’ve got a paper journal for the day and am using Omnifocus as the “data lake” of todo tasks. It doesn’t quite work, something like Trello could easily be better.
Collaborating on Research Data Support - Christina Maimone
This is a short and useful “what worked well/what was challenging” overview of three initiatives at Northwestern where Research Computing and the Libraries collaborated on research data support. Both entities have a lot of experience and a lot of resources around research data management, and have greater or lesser amounts of reach with different parts of the University community.
Even though your research computing team and your library may be quite different, I think there’ll be a lot of commonalities here if you’re thinking of trying similar initiatives. The Library has a different natural constituency then Research IT, different means of communicating, acts on different timescales, and has different institutional priorities. But that diversity of audiences, needs, expertises, and strengths can be an advantage if you can figure out how to split responsibilities in a way that meets everyone’s needs. It sounds like Northwestern was on their way to some success with this, although the pandemic has interrupted this work (a lot of the support efforts were built around in-person events.)
Post-Commit Reviews - Cindy Sridharan
A thought-provoking post by a very experienced and successful software development manager, advocating for teams to at least consider post-commit (but pre-deploy) code reviews once they have a sufficiently mature CI/CD environment. Along the way, a nice discussion of the role of code reviews (not just for bug-finding) and CI/CD. I don’t know yet how I feel about the recommendation but it’s certainly worth reading.
Software should be designed to last - Alfonso de la Rocha
The arguments here apply particularly to research software, but there’s a real tradeoff here too.
If we’re serious about writing research software that’s reusable, we need to take into account how research software gets reused. And the fact is that a lot of research software often lies fallow for a while, used for some project and then pretty much quiescent until a (say) postdoc picks it up for another, only somewhat related project. But a good chunk of the time, it doesn’t work any more.
So it’s important to find a way to routinely write research software that can survive a while without being actively maintained by the original project. It’s not enough that research software be sustainable, it needs to be somewhat less biodegradable than it usually is. That way it can be in decent enough shape that our poor postdoc can successfully run it again. (And no, container images or static binaries aren’t a panacea here).
The argument advanced here is mainly about dependencies - whittling them down to a small number - to reduce the cross-section for breakage by a dependency. That’s probably part of the answer. I think another is automated tooling to check for dependency updates. But I worry that a more fundamental one comes down to choosing frameworks and languages that are almost certainly going to be around in ten years, and that means forgoing a lot of new and very helpful (and productive) tools.
What do you think, are there options I’m missing for building research software that lasts?
The Data Science Lifecycle Process - Charles Morris & William Buchanan
This is a template GitHub repo for data science projects. More than just a list of recommended directories for their use, it’s a set of recommendations for the lifecycle of a data science project, which could be very useful for a lot of research work. There’s a pre-configured set of issue templates and labels; a well-thought out branching strategy including
as well as guidance on artifact management and deployment and operations. I
t’s very well thought out and worth looking through if your team helps perform data analysis projects, even if you don’t plan to use the template. Software development and data science projects have a lot to teach each other, and research computing is really where they come together, so efforts to combine the two catch my eye. Another one that crossed my desk this week is SplitGraph another “git for data” effort.
The Road to Kata Containers 2.0 - Horace Li, The New Stack
Nice overview on the history and near future of Kata Containers, an open source project for containers that are a little closer to VMs (they have their own instances of the kernel) and make use of hardware support in the CPUs for isolation. It’s no longer an Intel-only project but supports AMD, ARM, and IBM Power (and IBM z-series, if you’re into that).
1.1 Billion Taxi Rides using OmniSciDB and a MacBook Pro - Mark Litwintschik
While Postgres and MySQL are ubiquitous, a lot of researcher use cases come down to being able to do fast and varied analyses of the data. That’s often best handled by columnar databases, and the number and scope of open source columnar databases are growing remarkably - there’s way more than back in the day of MonetDB. OmniSciDB (neé MapD) is new to me but looks like it might be worth playing with.
34th VI-HPS Tuning Workshop - 28 - 30 July, 9am - 5pm BST, Free
A well regarded workshop on HPC Tuning tools. The hands-on sessions will be on Archer2 but the content should be useful more broadly.
Engineering Management 101 - Anytime, $150 USD
The folks at Developers First have put together a 4-5 hour self-guided course for new tech managers.
Singer lets you convert data between record-based file formats by defining an intermediate representation. So for instance if you write a ‘tap’ to read data in out of an API, it becomes trivial to output it to database tables, csvs, or to google sheets, where targets are already written. Hmm.
An old debugging warstory - email wouldn’t go more than 500 miles.
An update from twitter about their security hack, posted the day after the event. Researchers deserve at least as much transparency about the systems and software they use for science as do the users of a free social media site.
More for the “files are bad, actually” folder. Can applications recover from fsync() failures? Spoiler - mostly no.
Azure has followed AWS with beginning releasing a series of “Well-Architected Framework” documents. Having a set of reference architectures and guidelines makes it much easier for teams to get started.
Google’s released a really cool autodifferentiation package for python called jax.
A random forest model for choosing the right cloud provider and instance type for a HPC-type application.
A link shorting service implemented entirely in Github Pages and Github Actions.
A templating system for JSON, YAML, or ini config files.