#28 - Link Roundup, 12 June 2020

Hi!

Sorry for the late roundup this week - the week kind of got away from me.

But there’s been lots to talk about, so on we go!

Managing Teams

Architecture Jams: a Collaborative Way of Designing Software - Gergely Orosz
Proposals and Braintrusts - Nathan Broslawsky

These two articles both describe approaches to usefully open up architectural or other proposals to input from a group. The first, an “Architecture Jam”, is sort of half-brainstorming, half-architectural white boarding session; it can work remotely, but is definitely synchronous. The second is more asynchronous - writing up a proposal, and sending it off to a group of people whose job is, explicitly, to improve the proposal.

Either could work in our context. There are two keys for the architecture jams. First is to have a plausible but ideally (IMHO) unfinished proposal to give the group something concrete to kick off discussion. Second is to facilitate discussion well to make sure people are contributing (some good pointers are here, which is worth reading in its own right).

The more formal proposal and request for comments has the advantage that more of it can be asynchronous; but you’d really need to be sure of the starting point, and to ensure that those entrusted with improvements feel like they can propose significant changes.

On anti-racist management practices - Rachel Hands, Managing Equitable, Effective Teams
Bias doesn’t start with skin color - Chelsea Troy

Rachel Hands’ article is a nice list of resources and describes the mindset one has to be in to make use of the resources. Given how hard it is to make changes in our organization, we certainly don’t feel like we hold power, but as managers we do.

Chelsea Troy’s article is useful to read with Hands’: it’s easy to think we’re above being racist because we don’t recoil from people with black skin or think in crude stereotypes, but the more collectively damaging consequences come from behaviours and reactions that are more subtle than that.

In academic and adjacent circles, we like to think that we’re above needing such things, but we’re really not. A glance at #BlackInStem is enough to confirm that, but this week we’ve also seen a pretty scathing internal report leak from Oxford about white supremacist bias and a Canadian prof managed to get a pretty racist and sexist article published in a chemistry journal (arguing that diversity of workforce was making Chemistry worse). And tech of course is a dumpster fire. So in academic tech we have some work to do.

Product Management and Working with Research Communities

Ten Simple Rules for Starting (and Sustaining) an Academic Data Science Initiative - Micaela S. Parker, Arlyn E. Burgess, and Philip E. Bourne

Starting up any big multi-stakeholder, interdisciplinary research computing efforts is fraught and is more about people than technology. Three authors who have been through it bring you a list of ten rules. Four stand out for this audience:

Don’t Try to Own Everything - you’re trying to build partnerships, having your group do everything isn’t helpful; relatedly,
Leverage Core Service Groups - in particular, “Libraries have been in the information business for centuries”;
Establish a Set of Guiding Principles - What’s in scope, and what’s out?; finally
Hire Staff, and Support Them - probably high on our minds but less so at the VPR-type layer; it isn’t feasible to do something big by having a lot of people working on it off the corner of their desks.

Bioinformatics challenges in multidisciplinary research - Mina Ali

The reason I prefer to talk about Research Computing as a whole rather than research software development/systems/curated databases/…, or breaking things out into bioinformatics/data science/simulations/… , is that the same issues come up over and over again.

We’ve had articles in the roundup before about setting up a data science team in an organization and the challenges of having it be its own thing (and thus isolated) vs having team members scattered and individually embedded (and so don’t get the growth opportunity of working on multiple projects with colleagues). Mina Ali’s article talks about exactly the same issue with bioinformaticians, because it’s exactly the same problem.

Balancing the need for having some kind of embeddedness in problems with being part of a community is tricky, but it has to be done for the work and the individuals to succeed.

Research Software Development

Evidence for the importance of research software - Michelle Barker, Daniel S. Katz, Alejandra Gonzalez-Beltran

A nice list of papers, talks, and other resources on the topic of the impact of research software. There’s also a continually updated Zotero group library and Github repository.

Lessons learned in a decade of research software engineering GPU applications - Ben van Werkhoven, Willem Jan Palenstijn, and Alessio Sclocco

This is an interesting paper written by a team that has worked on GPU applications for digital forensics, image analysis, physical simulation, analysis pipelines, and geospatial databases. There are technical discussions in here — about use of host and device memory, libraries for various languages, that different parts of the code will need different approaches, and the usefulness for roofline plots to understand what is constraining performance at any given point.

More interesting to me are the higher-level observations:

The difficulties of dealing with essentially legacy code, inclduing the need to write tests before doing any porting
GPUs often allow going to larger scales, but that often means that other parts of the overall pipelines break down and need work
The need for managing researcher expectations - results won’t be bit-for-bit the same
The need to communicate performance results carefully (e.g. “It’s 100x faster on GPU!” “…Well, the thing about your old code was…”)

It’s a short easy read but the higher-level discussions carry over well beyond the particular case of working with GPUs.

A Look at Chapel, D, and Julia Using Kernel Matrix Calculations - Chibisi Chima-Okereke

An interesting view of using Chapel, D and Julia for some fairly basic numerical operations. It covers both the performance and the ergonomics aspects of the language, and to an extent also the responsiveness of the communities.

On the performance side, I don’t like rules like this in these sorts of comparisons:

We disallow any use of BLAS

It makes the performance comparisons unrealistic. There’s no shortage of interesting small computational problems that don’t have highly optimized libraries available that could be chosen instead.

That pet peeve of mine aside, I think the article is a good overview of what it takes to get performing code in those languages; the author’s sympathies like with D (it’s a D language blog this is posted on, after all) but Dr Chibisi Chima-Okereke is evenhanded about the strengths shortcomings of each of the languages and the community support for each.

There Are No Bugs, Just TODOs - Lukáš Linhart
Defects are not the fault of programmers - Hillel Wayne

These are two useful articles for clarifying the purposes of issue trackers — they’re not about blame or bad code, they’re about todo lists. They make different points; the first is that anything that doesn’t help the issue tracker be good for generating todo lists is something that can be usefully stripped out. The second is that bugs are about surprising interactions as often as they are about bad code (I think this is especially true in research software which is often quite subtle).

Research Computing Systems

Effort to Fund National Research Cloud for AI Advances - AI Trends
New ‘supercomputer’ to aid research for 90 firms - Shawn Pogatchnik, Irish Independent

I’m really interested in the accelerating drive to create data science/AI cloud type infrastructure, often supporting private sector users, outside of the usual academic research computing structures. The AI Trends article talks about a US bill supported by a bipartisan coalition of Representatives and Senators; Shawn Pogatchnik’s article is about a modest sized cluster procured by Ireland’s National Centre for Applied Data Analytics and Artificial Intelligence for member companies. The trend for these sorts of systems seem to be more cloudy and less HPCy, both in business model and architectures.

Scientific Computing World is running a survey on HPC for Science, with questions for both users and systems managers.

Emerging Data & Infrastructure Tools

Getting machine learning to production - Vikki Boykis
The Ultimate Guide to Deploying Machine Learning Models - Luigi Patruno, ML In Production

More research computing users are not only using research computing systems to train models, but to serve predictions based on those (frequently updated) models. That puts interesting demands on our teams. Putting the models into production combines the challenges of making curated data sets available and of running services in production.

Vikki Boykis, who has a great newsletter you should consider, described the start to finish of a simple, troll-y ML project which generates fake VC-written Medium think pieces. The machine learning part is just a tiny part of the whole process, and this relatively short read gives a good overview of what is involved. Luigi Patruno’s multi-blog-post writeup is more comprehensive, but digging into those details is a lot easier after seeing a small worked example like Boykis’.

MicroK8s, a lightweight but full kubernetes, is now natively available on MacOS and Windows for teams looking for developer or CI kubernetes setups or for those who want to learn Kubernetes beyond what minikube can do.

Calls for Proposals

SC20 Early Career program - Applications Due 31 July 2020

The Early Career program provides a one-day series of special sessions for early-career researchers, educators, and technical professionals. This includes academic, industry, and laboratory staff and post-docs within the first five years of a permanent position. The Early Career program is available to participants by application only. The program aims to help participants secure a better understanding of the issues and challenges faced while navigating a successful research career. The program will include engaging interactive sessions aimed at helping participants develop their professional skills, as well as a strategic vision for their future.

Random

Maybe of interest to some here - a Juypter kernel for sqlite.

This twitter thread has a lot of nice responses with templates for project strategy, planning, and roadmap documents.

Moving away from “master/slave” and “whitelist/blacklist” type terms in our projects is a good and useful thing to do. It’s really easy to rename git default branches - git branch -m master <newname>” and git push -u origin <newname> is all it takes. And given how many different ways projects use their default branch, the word master doesn’t even mean anything useful. For your project, is it the trunk? production? prerelease? Any of those are better and more informative names.

If you use nginx in your shop you likely already know this, but this week I learned about topngx and its predecessor ngxtop which provide top-like access to nginx logs.

When I was doing fluid dynamics simulations full time, one of my favourite fun facts was that F1 racing limits the amount of aerodynamic simulations teams can do because too many simulations would be an unfair advantage. Those rules are about to get tighter.

Taichi is a language for fast interactive graphics embedded in Python. The examples are super cool.

A cursed C implementation of tic-tac-toe, with all logic and even input handling in the printf statement.