OpEd: In SA, you should be thinking voice-led agentic

OpEd: In SA, you should be thinking voice-led agentic


ITWeb contributor Phillip de Wet.

ITWeb contributor Phillip de Wet.

Take it from someone who has tried (and often failed) to sell various newspapers and magazines: South Africa doesn’t have enough readers.

Depending on how you slice the data, something like a tenth of the population is illiterate by a formal definition. Functional literacy is a far bigger problem – you’ve no doubt seen the “can’t read for meaning” numbers out of primary – and aliteracy is far bigger still; a great many people simply won’t engage with anything more complex than a street sign.

There are two reasons we don’t really talk about that under the banner of “the divide” anymore: (a) It just got boring because there is seemingly nothing to be done about it, and (b) WhatsApp rolled out those awful voice notes that let your friends and family monologue at you.

Person-to-person asynchronous voice didn’t solve the problems of, say, a government agency trying to get citizens to use a complex online system. But it did unchain the aliterate from real-time calls.

For a while there we had a starkly two-tier society with one section willing and able to use SMS and then over-the-top text, the other section effectively locked out of that entire world. Then recording and sending voice became trivial. Now we just have a two-tier social class system, where respectable people use voice and cretins who can’t be bothered to take the time to properly formulate and type up their thoughts prey on their betters like time vampires.

Personally, I’d love to fight aliteracy in the fields and in the streets and never surrender.

We’re starting to see some interesting parallels to that as voice comes into the space. Do you have any idea how hard it was to turn text into speech that humans could listen to without going crazy because of the robotic delivery – until literally last year? Now it is as trivial as recording that WhatsApp voice note − assuming you have the token budget.

Text-to-speech is just access, though, which is important, but not the whole system. The approach of NotebookLM, on the other hand, could be transformative.

If you haven’t played with that particular Google tool, it is a domain-limited analysis and synthesis engine that makes a mean podcast and is increasingly useful for dialogue-type interrogation of material.

Or, put differently, it can take a collection of walls of text – of the kind aliterate people won’t even think about scaling with a grappling hook – and turn it into something you converse with. Anything an LLM can read can get a low-friction entry point, with multi-format re-presentation if you need.

The thing driving educationalists in the rich world crazy is how to stop that from being a bypass and avoidance tool, allowing students to skim material by listening to the podcast version, rather than achieving cognitive lift. Therein lies the akin to the WhatsApp voice note: all of the experience and research in education says those who engage with the written text too will get better results.

But do we care, in a country where nobody is going to read it anyway?

Personally, I’d love to fight aliteracy in the fields and in the streets and never surrender, but that battle is lost for at least a generation. And here is a tool – still too expensive, still too clunky, still lacking the UX for this use case, but a tool nonetheless – that can solve some problems soon.

So why aren’t we? I simply can’t find a voice-centric approach in South Africa. Not in government policy discussions, not in the little bit of literature we have on service delivery, not in the roadmaps I’ve seen from big enterprises. And so far, nobody has been able to tell me why.

The primary vendors will layer voice on top of everything eventually, whether at an OS or a browser or an agent level, sure. That’s not a priority, though, because it introduces cost and complexity you just don’t need in a high-literacy environment.

The cost and complexity is no different in South Africa, but leaning into voice should be an easy sell nonetheless. Policymakers, even at a corporate level, love the idea of the African oral tradition perhaps more than they like the idea of leapfrog technology, but slightly less than they like the idea of addressing the bottom of the pyramid.

If you’re going agentic for anything public-facing, please build voice into the workflow. You can thank me later.