#134 - 27 Aug 2022

I want you to outgrow this newsletter; Becoming more strategic; Give more better feedback; No one is reading your PDFs; Making jupyter git friendly; Effective programmers read code better; better virtual courses

I’ve been catching up on some reader feedback now that I’m back in the swing of newsletter things. Thank you all so much for your emails! I’ve learned a lot from our exchanges, and often shifted my position on things (or at the very least corrected how I talk about those things).

I especially appreciate pushback. Some areas I’ve really gotten pushback on in the past couple of weeks are what I’ve written about about utilities vs professional service firms (#127), or strategy and strategic planning (#125, #130). Discussion about those topics has helped me realize that I’ve never really been explicit about the audience I’m prioritizing for this newsletter.

In the past I’ve worked with VPRs and those that report to them, people putting together strategic plans for national communities, and those reporting to CIOs. People in those roles are doing important work, and deserve more support than they get!

But the most urgent need in our community, and the priority for me in this newsletter, isn’t supporting those that already regularly meet with CIOs or VPRs. It’s helping the first-level managers of individual small teams, and those who aspire to have more say in leadership in of those groups. Managers and leads and aspirants who were never given any training, are still given little to no direction or support, and yet find themselves accountable for a key research support function. People trained in academia’s years-long timescales, who are now having to figure out on their own how to run a team that has people depending on them weekly for firm deliverables. Leads who are figuring out how to do research project management when the research problem is still fuzzy. People managers still new to hiring, managing, and figuring out how to structure and strategize for a service organization. People now responsible for research software development groups, or data science service teams, or computing ops teams, or informatics-heavy core facilities, or even compute-and-data heavy spinouts from academic research groups.

There are thousands and thousands of people in your position. You are the absolute backbone of computational and data-enabled research support. The rubber hits the road, research projects succeed or fail, at the level of your teams and organizations. And I am routinely frustrated — outraged wouldn’t be too strong a word either — by how little support and training and resources you get.

Everyone is more than welcome to read this newsletter, of course, and give me feedback and pushback! There is lots of overlap with other communities. Some people leading small teams at new centres are the ones reporting to VPRs and CIOs. The people management topics are extremely widely applicable; we’re all just people, after all. I hope many folks from many communities read and benefit from the newsletter, even if not everything here will be relevant to their immediate needs.

I especially hope that more people in director and higher level roles, even VPRs and CIOs, read the newsletter too, if only to better appreciate the needs and challenges of the line managers and tech leads in our community.

But I can only write for one audience at a time, and this newsletter is for you.

Not forever, though. If I do my job well, you’ll outgrow this newsletter after a while. If you unsubscribe because you’ve become more confident in your skills and ability and knowledge, and don’t need this long weekly newsletter cluttering up your mailbox any more, I will count that as an enormous success for the newsletter and for the community. My hope is that after then, as you become more senior, you’ll give the incoming managers and leads more direction and support than you received. Maybe you’ll even recommend this newsletter to them. And perhaps you and I could work together in different ways.

To speed that day, let’s find new ways of working together and sharing knowledge within our professional community. I’d like to start building more peer-to-peer knowledge sharing amongst the newsletter readership as it grows. We’re not quite at a critical mass here yet, but we’re getting there. We have the (so far pretty quiet) #research-computing-and-data channel at the invaluable Rands Leadership slack, but maybe we can foster other ways of sharing each other’s stories and knowledge, too. What would work best for you? I could do short interviews with other readers, and report on them here; we could send in stories; we could build another (or build on another) online community. Let me know what you think would work best (and what wouldn’t work at all)! Just hit reply, or email me at [email protected].

And now, on to the roundup!

Managing Teams

Giving Good Feedback: Consider the Ratio - Charity Majors
How do I get better at giving feedback? - James Stanier

Routinely giving direct feedback is, I think, one of the hardest skills for many managers coming from academia and tech to master. But it’s absolutely vital. Probably everyone reading this wishes that the people they report and are accountable to gave them more feedback. Like you, your team members and peers deserve to know what the expectations for their work are, and when they are exceeding or failing to meet= those expectations. If you won’t tell them these things, if you won’t share that information, how can they possibly learn it? What will they learn instead?

Majors emphasizes the need for clarity of feedback, that you’re giving it for the right reasons, and that you give it frequently (“don’t wait for a ‘wow’ moment”) and that it’s mostly positive.

Stainer provides a meta-model for feedback - the Centre for Creative Leadership’s SBI model that Google uses (and that Majors recommends), Manager-Tools Feedback Model, and Lara Hogan’s feedback equation all follow this basic structure pretty closely. Note that this model can and should be used to give positive reinforcing feedback, and to give it significantly more often than you give negative corrective feedback:

  • A question and a micro-yes: e.g., “Can I give you some feedback?” or “Can I share some ideas how we could improve this?”. Back off if they say no.
  • Stating the data point on behaviour
  • Stating the impact
  • Ending on a question either seeking information or asking for change

Then he suggests a habit of seeking opportunities to give more feedback (again, mostly positive).


Technical Leadership

Reducing Friction - C J Silverio

Silverio gives us a great article on a key role for technical leadership of a team and an organization - reducing the number of things slowing the team down unnecessarily, rather than trying to speed them up somehow.

The theme is reducing the friction in the system. Having enough process that people know how to do things, and having friction where needed (cars can’t drive without friction between the wheels and the road!) but not unnecessary aerodynamic drag coming from either overly heavyweight processes or inadequate tooling and support.

The article correctly points out everyone agrees these are bad things, but that friction builds up over time, to the point that people might not even really see it. “That’s just the way things work here”. Constantly reducing friction requires eternal vigilance.


Managing Your Own Career

How to Be a Senior Leader - Stay SaaSy
Be More Strategic (video) - Chris Williams

A nice post by the Stay SaaSy team on what becoming a more senior leader entails:

You need to build a machine that repeatably produces the outputs that you owe the [organization], rather than focusing on producing the outputs themselves through personal heroics or force of will.

The post also mentions accepting a larger scope of responsibility, winning with people you don’t necessarily like, and that you need to constantly search out feedback.

Related to the increase in scope, Chris Williams who has a long career in tech leadership posted a short and terrific description of what hearing “be more strategic” means as feedback. It’s a tiktok video — yes, your faithful Gen-X correspondent is linking to tiktok videos now, and no, don’t worry, there’s no Research Computing Teams tiktok series coming up. He has a lovely diagram of the implied increase in scope of thinking along both time and organization dimensions, which I’m stealing from shamelessly to include here.

A diagram taken from Chris William's video and slightly tweaked for our purposes.  It shows a 2-dimensional schematic of scope in both timescales (day to years) and organizational impact (individual to institution and research community).  A “typical” IC might start in the bottom left corner.  As one grows in responsibility and thus need to think strategically, one moves one’s thinking forward along both directions, moving diagonally up the diagram.


Product Management and Working with Research Communities

The solutions to all our problems may be buried in PDFs that nobody reads - Christopher Ingraham, Washington Post

This is an old (2014!) article that came up on twitter and other places recently, and has an evergreen message for those of us in research and service organizations.

The key statistic that Ingraham zooms in on comes from a World Bank report on impact of their reports. Nearly a third of over 1,500 PDF reports that were atlon their website for at least two years had never been downloaded, not even once.

The suggestion in the article is that the issue was the format (PDF vs web page), and doubtless that makes some difference, but there’s a bigger issue here than a technology choices.

Our work does not speak for itself. Reports and successful case studies and our centre’s web resources are not moral agents that take independent action. They are not capable of communicating themselves to those on campus or in our communities who might benefit from them or learn from them.

Only people can communicate with other people. That means assembling successful case studies (say) of your centre or product or services on your web page is a crucially important first step, but it isn’t enough. It means making those results known requires constantly communicating them, again and again, to your community. It’s labour intensive, and tedious, and there’s no alternative. Otherwise, the pointer to a team who could be a solution to our researcher’s problems might be on a web page that they never read.

The Washington Post’s plot of the distribution of downloads of World Bank reports, with a long tail of highly-read reports but with almost 40% of reports downloaded fewer than 100 times, and almost a third never downloaded at all.


Ten simple rules for leveraging virtual interaction to build higher-level learning into bioinformatics short courses - Bacon et al, PLOS CompBio

One of the challenges and benefits of moving so much training to a virtual format has been having to think much more carefully about how we structure the many different kinds of communications that go on in training. Q&A, support, peer-to-peer communication during group work, as well as the actual teaching part.

I’ve put off discussing this paper since the end of July because I haven’t been able to summarize it - it covers the range of communications and activity types, and how to structure them to improve learning and building a community within the context of the course. So I won’t. If you’re hosting long virtual trainings, or even want to think about how you’re going to structure long-form in person training in the future, to support both learning and DEI, this article is very thorough and thought-generating while being a quick read.


Research Software Development

Correlates of Programmer Efficacy and Their Link to Experience: A Combined EEG and Eye-Tracking Study - Peitek et al, ESEC/FSE 2022

When we teach researchers or developers to code, we do not put nearly enough emphasis on how to read code.

We found that programmers with high efficacy read source code [in a] more targeted [way] and with lower cognitive load. Commonly used experience levels do not predict programmer efficacy well, but self- estimation and indicators of learning eagerness are fairly accurate.

In a study of a small number of people who underwent very close scrutiny (eye-tracking, electroencephalography) while working on 32 Java snippets, developers who did particularly well were consistently better at zooming in on the right part of the source code while reading it, and read the code more easily.

Other metrics that seemed to be relevant were how much time the developers spent reviewing others’ code, writing tests, and mentoring and learning. Years of programming (professional or otherwise) had very little correlation with effectiveness (I can not emphasize this enough - get rid of your “5 years of python experience” job requirements).

Self-reported comparison with others also correlated well with performance, but I’d suggest caution here. The participants were overwhelmingly male (31/37) and we men are disproportionately likely to confidently compare ourselves favourably to others.

I also wouldn’t suggest using these results for hiring decisions - it’s one study. But it does help emphasize that if we want to help our developers grow professionally and become more effective, we might prioritize some activities that are well-motivated from many other sources as well:

  • Provide opportunities for learning and mentoring
  • Provide opportunities for review and working with different kinds of code and programming languages
  • Emphasize knowledge sharing and improving code-reading (and readability) skills for code review

Table 5 from the paper - Spearman’s 𝜌 correlation between various experience measures and programmer efficacy.  Particularly relevant metrics for us were self-reported time spent mentoring and learning, amount of code review they do, and time writing tests.


Research Data Management and Analysis

Maybe I’m the last person to know about this, but the (unofficial) R Installation Manager, rig, looks amazing for juggling multiple R installations, with even a nice menu bar for Mac OS X.


The Jupyter+git problem is now solved - Jeremy Howard, fast.ai

One of my biggest complaints with Jupyter has long been that it is a terrific exploratory code environment, but had no real offramps for getting code under version control and unit tests, or building good documents, or interacting with real IDEs. In those important respects, RStudio has always really shone.

I figured those were unavoidable limitations of Jupyter and were unfixable. VSCode’s notebooks, giving some of the best of IDEs and notebooks, seemed cute but struck me as just papering over the problem. Now nbdev is starting to make me consider I might have been wrong.

Howard’s article talks about the git merge driver and Jupyter save hooks that come with nbdev2, greatly improving handling of version control with Jupyter whether there’s conflicts or not.


Why are bioinformatics workflows different? - Benjamin Siranosian

Bioinformatics and other scientific data analysis pipelines increasingly rely on workflow managers like cromwell, nextflow, snakemake, toil, and others. These tools are described in the same way and use much of the same terminology as various data engineering workflow tools (think airflow, dagster, prefect, argo…) and people from one community can be confused about the difference between the two.

This is a handy article to have in your back pocket if a bioinformatician/data engineer wonders why the workflow orchestration tools used by data engineers/bioinformaticians are so weird and ill-suited to what they need. Siranosian concisely describes the different problem regimes the two different kinds of tools are solving and why the needs are different.


Research Computing Systems

What you should and (probably) shouldn’t try from SRE - Steve Smith and Ali Asad Lotia, Equal Experts

This is an older article that caught my eye - a very pragmatic approach to adopting some of the practices from Service Reliability Engineering that offer a more principled approach to providing good customer experience for services and systems, without dealing with the things that aren’t feasible unless you’re a huge organization.

Their capsule summary for the two recommendations for teams of our kinds of sizes:

  • Try availability targets, request success rate measurements, Four Golden Signals [throughput, error rate, latency, saturation: LJD], Service Level Indicators, and Service Level Objectives. But not all at once!
  • Don’t try error budgets or an SRE on-call team.

Emerging Technologies and Practices

Using Firecracker and Go to run short-lived, untrusted code execution jobs - Stanislas

Is anyone using firecracker yet? I’d love to hear about it if you are.

Firecracker is a virtual machine manager which can spin up very lightweight VMs in not much longer than it takes to spin up a docker container. This allows for very strongly isolated execution of code that doesn’t have to be particularly trusted.

Stansilas writes of their student project experience writing a code benchmarking service. This is very much “run untrusted code as a service”, and the untrusted code runs in firecracker VMs.

This is a fun use case, and other applications within research computing and data teams (CI/CD; coding playgrounds for public teaching) come quickly to mind.


Random

Terrific news, everyone - there’s a Unicode-to-EBCDIC encoding, which means that we know how to type poop emojis on punched cards.

XScreensaver was released 30 years ago.

A cute postgres-in-the-browser playground for learning and testing your psql skills without setting up (and messing up) a postgres server.

Logic gates in minesweeper.

Performing efficient anti-joins.


That’s it…

And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.

Have a great weekend, and good luck in the coming week with your research computing team,

Jonathan

About This Newsletter

Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.

So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.

This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.