#163 - 9 Apr 2023

Measure What Matters - Logic Models. Plus: Reducing the lottery factor; Polars; LLMs in Production; Cloud-first databases

Measure What Matters - Logic Models

Last week (#162) I spent a lot of time on training evaluation and the Kirkpatrick model, but the underlying idea has much wider application.

We chose training interventions — or other services or products for our community — because, consciously or unconsciously, we have some mental model for how this work will impact research and scholarship in our organization. There’s typically some pretty simple mechanism that we’ve hypothesized. That mechanism is observable, even if qualitatively, and from those observations we can improve the intervention’s impact.

Tools Like Logic Models Describe How Our Activities Drive Research Impact

The Kirkpatrick model can be thought of as evaluating along different stages of some logic model:

A simple logic model: inputs leading to activities leading to outputs leading to outcomes.

The logic of our intervention is that we marshal some inputs, perform some activities, that generates some outputs, and those outputs lead to some desired outcomes.

Crucially, the outcome is the entire point. There’s no reason to do the work except for the outcomes. This isn’t a hobby, where the activity is its own goal. We’re professionals, performing badly needed work, in a space where there’s far more that could usefully be done than we can possibly do. Everyone is relying on us for the outcomes. Not the outputs. Not the activities. Certainly not the inputs.

The purpose of our team isn’t to teach classes, or run a computer system, or write software, or curate data, or perform data analyses. Those are activities we perform. They are useful. It’s even possible that they are the most useful things we could be doing. But they aren’t the purpose. The purpose is the impact on research or scholarship the activities enable downstream.

Tools Like Logic Models Tell Us What To Measure And Improve

By measuring the quality of our inputs, the effectiveness of the activities, the usefulness of the outputs, and their applicability to some outcome, we can iterate and make our desired outcomes more likely.

By designing our interventions with the outcome in mind - like with hiring (#135) - we are much more likely to achieve those outcomes. And it effects everything - which inputs we chose, what activities to perform, what outputs to aim for.

And yet, we tend to focus on the inputs or maybe the activities, because they’re easy to measure. But they’re they least relevant of all. Bad inputs can limit our impact. But fantastic quality inputs of the wrong thing, or inputs where there’s no mechanism to support to outputs or outcomes, are pointless.

The Kirkpatrick model for training evaluation implies a fairly straightforward, linear, logic model:

  • Inputs: materials, venue, instructors… - evaluated via Reaction-type surveys.
  • Activities: instruction, exercises, pedagogy… - evaluated via educational assessments like pre- & post-tests
  • Outputs: Behaviour change; using the new skills/knowledge - evaluated via followup discussions, surveys, observation
  • Outcomes: Research & scholarship impacts - evaluated via followup discussions, testimonials, grants, publications, invited talks, citations…

Problems at each stage of the mechanism will limit impact downstream. So being able to reason about and observe the process from start to finish is what helps us design, iterate, and improve what we do and how we do it to have the maximum impact.

The Mechanism, Not the Names, Are What Matter

You may well have noticed that even in the simple Kirkpatrick model above, the distinction between inputs and activities are a little fuzzy.

And if you have done volunteer board work for nonprofits, or are otherwise privy to that communities discussions, you’ll know that there’s sometimes five levels in a logic model, or the names used differ. Or that there’s debates about logic models vs theory of change vs outcome mapping.

Those discussions are that community’s R vs Python, vi vs emacs, Ubuntu vs Rocky, MySQL vs Postgres, Rust vs C++. For practitioners who do that kind of work all day, those distinctions doubtless matter a great deal. Happily, we can proceed unburdened by such concerns.

The important thing from our point of view is that we end up with some kind of proposed, observable, mechanism of how our resources and what we do add up to some meaningful impact.

These Tools Can Guide Programmes of Activities: RCT

I spent an hour or so trying to come up with a made-up example of a non-trivial logic model that would make sense to most of the readership, and I couldn’t come up with anything I liked. So let’s tackle a real example we should all automatically have some kind of familiarity with, and that I spend some time thinking about - our nascent ResearchComputingTeams.org community.

Logic model of the RCT community.  There’s my inputs (12 hours/week, pre-existing hands-on experience, perspectives from conversations with the community); activities (job board, talks, writing newsletter and documents, and conversations) and potential future activities (recording interviews and videos, coaching, consulting, recruiting support); outputs current (tools, techniques for our managers/leaders to be more successful; more confident managers and leads) and future (better roles, community of peers aiming to improve practice; jobs with broad scope); and the intended outcome (a self-sustaining community of new strong research computing, software, and data leaders, in jobs where they can make a difference)

The diagram is above, with possible future items are in dashed boxes.

Let’s look at the desired outcome (just one in this case): “Self-sustaining community of new strong research computing, software, and data leaders, in jobs where they can make a difference”. That is not something that could be measured quantitatively with any degree of rigour, and yet it’s clear enough that any proposed effort could be evaluated against it. It’s a beacon to steer by.

You certainly noticed that the diagram is kind of messy. That’s good. It’s important that with finite resources — and everyone always has finite resources — the activities and outputs interlock to support the outcome or outcomes. It means the activities reinforce each other, rather than just being a todo list of unrelated items.

Ah yes, resource constraints. I have about 12 hours a week for this (and Manager, Ph.D.) right now. That’s not a problem to lament; it’s a boundary condition to be incorporated into the solution. Sometimes, advocating for more resources is a possible activity; here it isn’t in the short term, and in any case, there’s no point in searching for more resources until we’re already having the greatest possible impact with existing resources.

The RCT effort in general, and this logic model in particular, is a hypothesis about having impact in on leadership in research computing, software, and data science/engineering/curation. That hypothesis needs to be confirmed through data; and activities and outputs have to be improved with that data to support the outcome. My main data collection mechanism at the first few steps for the foreseeable future is conversations with readers and community members. That’s a vital feedback mechanism. Right now, lack of that input is the rate limiting step. There’s just not enough discussion. I have two levers to work with there:

  • Increase engagement per community member somehow
  • Increase community size

There’s no point in building out any of the other activities until more feedback is happening; again, I’m aiming for impact, not simply doing these activities for their own sake.

Another argument for focussing on increasing community size and engagement: you’ll have noticed there’s currently no activities leading to the desired output of “community of RCT peers”. There was a #research-computing-and-data channel on Rands Leadership Slack, but we lacked the critical mass to keep that going, so I let it get archived. More and more engaged community members will allow some peer-to-peer support through a number of mechanisms, which will building towards the desired impact without increasing my time involvement.

As a result of this current focus on community size and engagement, you may have noticed that the interviews - which were going pretty well, I think - are on hold for now. They’re good and useful things to do, for a few reasons! But as done, they don’t address the biggest problem I have in increasing impact, and they take real time, so they’re on pause.

Also - and I hate this - I’ve recently realized that I’m going to put the job board on hold for now. Again, it’s useful, I get comments that people really like it, it helps me keep a finger on the hiring pulse. I personally think it’s really important, so the argument isn’t that it’s not good to have. But it just doesn’t address the bottleneck in impact RCT currently suffers from, and it takes a surprising amount of time to populate and maintain. If I want to have as much impact as possible with the current resources, the energy going into that has to be redirected, and the job board has to be put on hold.

Pausing the interviews and job board will help me respond faster to the input I do get (I’ve been unacceptably slow on this lately) and put things in place to get more input. That has to be the priority. If I’m not measuring how things are going, there’s no point in doing the work, because I won’t be doing the things that matter.

One last thing on resources and inputs: just because I can’t free up more of my time doesn’t mean I can’t find ways to increase the amount activities. I can find other individuals or groups to work with - other teams in the same space aren’t competition, they’re potential collaborators (#142). One thing I’d like to do is to see if I can find someone / some group in academic/public sector core facilities, or private sector data science, to work with on some activities. That would allow me leverage my time with that of others.

The Value Of These Tools Is In How They Clarify Thinking

You might have noticed that my use of logic model above was kind of sloppy. There’s items in the middle that could arguably be either activities or outputs, or should be shuffled around. If I were using this to get money from a funder, I’d have to be stricter about meeting their categorizations for things, and redraw the diagram and label things accordingly.

But we don’t actually care about those distinctions in and of themselves. We use tools like this principally to clarify our own and our teams’ or stakeholders’ thinking. If we have to distort the tool a little in the service of that clarity, that’s fine. Like “strategy” vs “tactics” vs “objectives” vs “goals” (#130), the format and taxonomy aren’t what matters. (Unless this is principally a compliance exercise that gatekeepers demand. If it is, do what you have to do). What matters is the insight that tools like this or others help you come to.

There’s many other tools out there — SWOT, Wardley maps, Business model canvas, Value chains, Where to play/how to win, etc etc. They are just tools. They have proven themselves useful to people in some context in the past; if they’re suggested to you, maybe they’ll be useful for you, too, or maybe they won’t be. None of them produce answers; they’re just ways of organizing your or a teams’ thinking and communication in a way that might or might not help in your current circumstance. If you try one and it doesn’t do much for you, it’s certainly possible that you’re doing it wrong, but it’s more likely that it just doesn’t bring to the surface or clarify or communicate anything that you need surfaced or clarified or communicated right now. That’s fine.


Anyway, that discussion of RCT in an RCT issue was a little more meta than I had planned. Was it useful? Have you tried similar exercises when designing programmes of activity? What’s been helpful for your team, and what hasn’t? I’d love to hear - hit reply, or email me at [email protected].

And now, on to the roundup - quite short this holiday weekend. If this is a long weekend for you, I hope you find it enjoyable; if it’s a particularly special time of the year for you I hope you find it meaningful.

Managing Teams and Individuals

This week over at Manager, Ph.D. I talked about finding time constantly in small ways to make time to work on, rather than in, the team. And there were articles in the roundup on:

  • The art of delegation
  • Being the (emotional) thermometer, not thermostat
  • Steps towards strategic thinking
  • Letting go of doing all the deciding yourself

Technical Leadership

Reducing the Lottery Factor, for Data Teams - Jacob Adler

Adler writes about data science or engineering teams, but the main considerations of small, expert, fast-learning teams with very mobile team members applies for teams across of our disciplines.

We’re lucky to have such excellent foundations of knowledge, but at what point does all that siloed knowledge become a liability? A hypothetical to demonstrate why: what happens if they win the lottery next week and resign? We may find ourselves scrambling trying to interpret their work, unprepared for certain situations, or even locked out of accounts.

I like how Adler sets up this article, with growing and evolving knowledge sharing practices as the team grows.

For one team member,

  • It is never too early to start documenting
  • Have a single source of truth for credentials
  • Be verbose with comments
  • Have a changelog for data sets, code…

Once you have two to five members,

  • It’s never too early to have a really strong onboarding process - both to share knowledge and to discover gaps in documentation
  • Assign scavenger hunts
  • Pair programming (or analysis or systems work or…) and peer review for knowledge sharing (in both directions)

Once you have five to ten team members,

  • As specialities develop, develop strong cross-training across practices
  • Start recording videos for the knowledge base to capture context that is hard to share in text

I’d add that for our teams, external knowledge sharing (giving talks and presentations or blog posts) can start happening at the 2-5 team member size, and can be super useful for visibility and professional development.


Research Data Management and Analysis

Polars for initial data analysis, Polars for production - Itamar Turner-Trauring

After years of pandas being the only real Python data frame library in town, Polars came on the scene about a year ago. It’s still fast-moving, which has pros and cons (I count 13 small breaking changes in the most recent release). But it’s a very fast data frame library for Rust and Python (exposing a python API for data related work was a very good product decision), and well ahead of the latest Pandas had an Apache Arrow back end for fast columnar work (which is mostly what we want in our line of work).

In this article Turner-Trauring demonstrates using Polars for data exploration (where eager evaluation is useful, to keep one’s options open) and production (where lazy evaluation is useful, for speed and memory optimization).

Maybe relatedly, if you’re going to play with some new technology to learn it, why not use some fun datasets? Esther Schindler advocates for using fun data sets to play with, and provides some pointers, in Groovy Datasets for Test Databases.


Emerging Technologies and Practices

A lot of teams are going to be asked to host Large Language Model (LLM) inference services in the near term. MLOps in general is a lot more involved than just hosting an RShiny app or Jupyter notebook, and LLMs - especially if the plan is for them to frequently be updated - are still more involved. Here’s a promising-looking free LLMs in Production workshop this coming week (13 Apr).


Building a database in the 2020s - Ed Huang, PingCAP/TiDB

Huang discussions considerations when building a cloud-first database. It’s interesting outside of that context, though.

I think a lot of our teams (especially system teams, but also software development) don’t yet have enough experience with building and maintaining cloud-native systems to realize that it can be a lot more than just running a program in a VM on AWS instead of on the local cluster. There are opportunities, and user expectations, in cloud-based API-driven services which are very different than the more static deployments we’re used to.

Not that everything has to be elastically scalable, multi-tenant, and serverless from the point of view of the user, but those are possibilities and capabilities that cloud deployments give us, and that we should avail ourselves of more often than I think we do. Huang’s article discussions of why that is matters pretty broadly.


Random

The term database “sharding” maybe came from an in-universe reference to parallel universes in a late 1990s MMORPG?

A simple mutex web service.

A simple command line tool for handling URLs - trurl.

Fine-Tuning LLaMa models on stack overflow questions and answers, with RLHF. Approaches like this are going to worth looking into in the near future for answering repeated user questions at any centre large enough to have large user question databases.

Relatedly, there’s never been a better time to start learning or exploring deep learning, with new tutorials and tools popping up everwhere - Hello Deep Learning, GPT4All, or tuned lens for examining what’s going on in a transformer model layer-by-layer.

Walking carefully through current best practices in implementing hash tables (don’t implement your own hash table).

SSH authorization key experiments.

An text-based IDE which requires only VT100 capability, which means it can be run essentially anywhere - orbiton.


That’s it…

And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.

Have a great weekend, and good luck in the coming week with your research computing team,

Jonathan

About This Newsletter

Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.

So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.

This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.


Jobs Leading Research Computing Teams

This week’s new-listing highlights are below in the email edition; the full listing of 193 jobs is, as ever, available on the job board. This will be the last job board for a while; I’ll leave it up on the web page for a couple of weeks.