Odd podcast downloads

https://op3.dev/show/55f8007aec094899a02fd44273aa6558 shows massively high numbers for Podnews Daily recently. As above, shows that normally get about 2,500 downloads are suddenly getting 49,000 downloads. But it’s not all good news.
- Stats for March 1 show 51% of downloads from Malaysia and a further 38% of downloads from Indonesia.
- Top devices for March 1 = Android Phone (97.8% of all downloads).
- Seemingly, the only useragent being used is Dalvik.
Now - all of these audio downloads appear to be the AAC version, and tagged as ?from=googleaudionews
- which means it’s come from a special secret RSS feed that I serve the systems that run Google News’s audio service. (“Hey, Google, play the latest news”). Nobody would be asking for this file if they haven’t seen the Google News RSS feed that I given Google (and Google alone).
I’ve checked the logs for the URL of that special RSS feed. For the entire month of February, I don’t see anything out of the ordinary - the only thing asking for that secret RSS feed is Googlebot, from an IP range that appears to be owned by Google. (Apart from one listener on Overcast who found it there. Weird. Anyway, that’s fine.)
So, this must be somehow coming from Google News in some way. But, again, I don’t know how - only that the Google News secret RSS feed is the only way that we’d add ?from=googleaudionews
on the end of the audio URL.
So, to the audio files. Here are the server logs for the audio files, with the user’s IP address removed, for a day. Spot the ones saying ?from=googleaudionews
in this big list. I’ve removed the IP addresses for privacy reasons, but “Location” shows the Amazon CloudFront point that the user has connected to, which is usually closest to where they are physically located; you can see plenty of KUL (Kuala Lumpur, Malaysia), SIN (Singapore), and CGK (Jakarta, Indonesia). All of the listeners are using normal IP addresses that look like a residential Indonesian/Malaysian IP address.
All of them have “Dalvik” as their user-agent - that’s a generic useragent from an Android phone if the developer doesn’t set one - but all of them are running different phones. Here’s an example - lots of differences between Android 13, 14 and 15 (and some even using Android 8!). The latest version of Android is v15.
(The other interesting thing about these audio downloads is that they’re also all HTTP/1.1 connections. It seems to be the case that some old Android APIs internally only support HTTP/1.1 - notably the HttpClientHandler class.)
Dalvik/2.1.0 (Linux; U; Android 13; 220333QAG Build/TKQ1.221114.001)
Dalvik/2.1.0 (Linux; U; Android 14; V2248 Build/UP1A.231005.007)
Dalvik/2.1.0 (Linux; U; Android 14; ALI-NX1 Build/HONORALI-N21)
Dalvik/2.1.0 (Linux; U; Android 14; RMX3997 Build/UP1A.231005.007)
Dalvik/2.1.0 (Linux; U; Android 15; FCP-N49 Build/HONORFCP-N49)
Dalvik/2.1.0 (Linux; U; Android 15; CPH2437 Build/AP3A.240617.008)
Dalvik/2.1.0 (Linux; U; Android 15; CPH2581 Build/AP3A.240617.008)
Dalvik/2.1.0 (Linux; U; Android 8.1.0; DUA-L22 Build/HONORDUA-L22)
Dalvik/2.1.0 (Linux; U; Android 14; CRT-NX1 Build/HONORCRT-N31)
Dalvik/2.1.0 (Linux; U; Android 8.1.0; CPH1803 Build/OPM1.171019.026)
Dalvik/2.1.0 (Linux; U; Android 15; CPH2607 Build/SP1A.210812.016)
Dalvik/2.1.0 (Linux; U; Android 14; LLY-NX1 Build/HONORLLY-N31)
Dalvik/2.1.0 (Linux; U; Android 15; ELP-NX9 Build/HONORELP-N39)
Dalvik/2.1.0 (Linux; U; Android 13; RMX3491 Build/RKQ1.211119.0
Some of those files are also being downloaded by things like this:
Mozilla/5.0 (Fuchsia) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36 CrKey/1.56.500000
… now, this is a Google Nest speaker, the fancy one with a screen, I happen to know (because that’s what uses Fuschia). This seems an accurate Google News client.
Also, these:
Mozilla/5.0 (Linux; Android 8.1.0; vivo 1807 Build/OPM1.171019.026; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/111.0.5563.58 Mobile Safari/537.36 GSA/14.14.16.26.arm64
Mozilla/5.0 (Linux; Android 9; SM-A505F Build/PPR1.180610.011; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/77.0.3865.92 Mobile Safari/537.36 GSA/13.48.11.26.arm64
Mozilla/5.0 (Linux; Android 7.1.1; CPH1801 Build/NMF26F; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/119.0.6045.193 Mobile Safari/537.36 GSA/13.20.11.23.arm64`
These are the Google app, which is pre-installed in all Android phones. Spot the GSA/
at the end. Again, I’d be comfortable to see a Google News audio request coming from something like this.
But, I don’t think the others are. Just a standard Dalvik call is not that helpful. (Dalvik is the thing that runs Android apps - and is similar to AppleCoreAudio: it’s the name given by the device when the developer hasn’t bothered to add a name of its own).
I’m confused. Is this really Google traffic; if so, why are only users in Malaysia and Indonesia being affected by it; why is it suddenly happening now?
And if it isn’t Google traffic, how are they getting hold of the special feed’s details, when they’re not asking for it?
As of 4.30pm my time, I’ve changed the user-agent tag that my script gives, to read “googleaudionews1” - at least that might help work out if it is being reingested. I can put anything in there - I wonder what else might be interesting.
I don’t see any referers in the two logs of data.
So… what’s going on? Let me know. I’m most confused.
Day two
So, as above, I changed the script to add googleaudionews
to the end of the audio files, just so we have clarity that it’s the secret RSS feed that Google News uses. It is - all these downloads are now marked as “googleaudionews1”.
So, all these downloads are coming from the secret RSS feed that only Google News uses - and, as I can see from the logs, only Google consumes.
The above is the start of the avalanche of downloads. First, at 11:27:00, a single download from Dallas, as Googlebot-News Audio
grabs the audio. Then, five seconds later, the start of a deluge of downloads from Malaysia and Indonesia, with “Dalvik” as the user agent. There is no “Dalvik” until the Googlebot has been to visit. So, more proof tying it to Google.
Things I can also tell you is that the IP addresses (hidden in the above screenshot) are all residential ISPs. I’m really pleased that so many Indonesians and Malaysians are listening to a scrappy podcast about podcasting news, but I also doubt that they would be doing that.
In the meantime, I’ve taken that M4A that I’m creating for Google and Siri, and changed it from a 128kbps/44.1kHz (6.8MB) file to a 32kbps/24kHz (1.9MB) file. If I’m getting 20x the downloads, the least I can do is produce a 3x smaller file. (To be honest, it sounds fine - AAC HE v2 in mono actually is quite decent this low bitrate).
The mystery continues! Hit refresh here tomorrow to see more…