Display stats about PPA usage

Bug #139855 reported by Alexandre Vassalotti
466
This bug affects 77 people
Affects Status Importance Assigned to Milestone
Coccinelle
Invalid
Wishlist
Unassigned
Herodotos
Invalid
Wishlist
Unassigned
Launchpad itself
Fix Released
Low
Julian Edwards

Bug Description

There should be a way to see the number of people using a PPA. This would give PPA maintainers a way to quantify the importance of his personal archive. Example of useful information:
   - Downloads stats for each package in the archive
   - Distribution release used
   - Number of users subscribed to the archive over time
   - Number of download requests over time
   - Amount of data transfered over time

Related branches

Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

This is a *really* interesting idea! At this stage it's difficult to do because the system which publishes the packages on the net is totally disconnected from the system which tracks them internally. Also, "subscribe to the PPA" really means "has the PPA in their sources file", and that has all the usual issues in terms of figuring out how many users you have in a distro, but I think with some cunning log analysis we could generate traffic profiles for PPAs and packages published in PPAs.

Changed in soyuz:
importance: Undecided → Low
Revision history for this message
Uphaar Agrawalla (uphaar) wrote :

I'm really interested in the number of downloads from my PPA for various individual packages present. Will it be very difficult to have in the near future?

Revision history for this message
Mark Shuttleworth (sabdfl) wrote : Re: [Bug 139855] Re: Display stats about PPA usage

We don't currently have that information, it requires a fairly tricky
bit of analysis because of the way we publish the files. It's something
we'd like to have too!

Mark

Celso Providelo (cprov)
Changed in soyuz:
status: New → Triaged
Revision history for this message
Dominik Stadler (dominik-stadler) wrote :

FYI, the idea at http://brainstorm.ubuntu.com/idea/2105/ talks about the same thing.

Revision history for this message
Fabien Tassin (fta) wrote :

Here are some thoughts, hoping they will help make this bug move forward..

I assume that the raw data is available somewhere. No one explained how the PPA
files are spread to the world, but as the user URL is unique, it seems reasonable
to assume that the data is in the form of httpd or web proxy logs somewhere.
Then it's a matter of post-precessing that. I also assume we can ignore all the direct
downloads from the LP pages (librarian), focusing on what's available through APT
should be enough.

The next problem is to interpret those data.
The OP asked for some precise figures:

1/ Downloads stats for each package in the archive

what do we want to know?

Ideally, number of users for each version over time: if my assumption about the
logs is correct, they only show downloads, with no way to distinguish between
upgrades and new installs, so accounting just the number of downloads will not
give an accurate representation of the number of installations. The information
has to come from the user's machine, identified by a unique ID (like with
popcon) - not the IP address - maybe transported in a (fake) http referrer. It
will still not catch removals though..

Number of downloads over time: this seems possible, but tricky to represent as
there's an unknown (and increasing) number of versions.
http://popcon.debian.org/stat/release.png is a good example as to why it is
tricky. For fast moving PPAs, such as dailies, or trunk/tip builds, it's even
worse.

2/ Distribution release used

this should be easy. I also find this info very valuable, as there's no point
spending time maintaining backports for a distro used by no one.
It should probably not be based on the indexes stats, as it's possible to have
multiple versions of the same repository, esp. when a new ubuntu is released,
PPAs maintainers often take time to start producing debs for the new version
(debs are not copied like in the real archives).
It should come from the download stats, aggregated by package numbers.

3/ Number of users subscribed to the archive over time

i don't think we'll ever get stats per user, it's always per machine (not to
mention proxies/caches).

4/ Number of download requests over time

hm, this is 1/, sort of..

5/ Amount of data transfered over time

this one should be trivial.

In the meantime, what about giving the PPA owners access to their raw logs,
properly anonymized, for ex by md5-ing IP addresses? The privacy risk will be
the same as with popcon (i.e. if there's just 1 user for a given package, it's
safe to assume it's the PPA maintainer, making him a target), but given a md5,
finding the IP to exploit is, well, you know..
This could allow users to experiment, and maybe find good ideas, create mockups..

Revision history for this message
William Grant (wgrant) wrote :
Download full text (4.4 KiB)

On Sat, 2009-08-22 at 14:48 +0000, Fabien Tassin wrote:
> Here are some thoughts, hoping they will help make this bug move
> forward..
>
> I assume that the raw data is available somewhere. No one explained how the PPA
> files are spread to the world, but as the user URL is unique, it seems reasonable
> to assume that the data is in the form of httpd or web proxy logs somewhere.
> Then it's a matter of post-precessing that. I also assume we can ignore all the direct
> downloads from the LP pages (librarian), focusing on what's available through APT
> should be enough.

I believe that all requests are currently served via Apache from one
server. The data should all be in Apache logs on that server.
Conveniently enough, most of the backend work is already done: the same
log parsing technique is used to count project downloads.

> The next problem is to interpret those data.
> The OP asked for some precise figures:
>
> 1/ Downloads stats for each package in the archive
>
> what do we want to know?
>
> Ideally, number of users for each version over time: if my assumption about the
> logs is correct, they only show downloads, with no way to distinguish between
> upgrades and new installs, so accounting just the number of downloads will not
> give an accurate representation of the number of installations.

Correct. All we can see is the HTTP request; it could be an
installation, upgrade, reinstallation, or just somebody stuffing up the
stats!

> The information
> has to come from the user's machine, identified by a unique ID (like with
> popcon) - not the IP address - maybe transported in a (fake) http referrer. It
> will still not catch removals though..

And it would be a privacy violation. Making each installation send a
unique tracking number in apt requests is dodgy and would not be
accepted by the community.

> Number of downloads over time: this seems possible, but tricky to represent as
> there's an unknown (and increasing) number of versions.
> http://popcon.debian.org/stat/release.png is a good example as to why it is
> tricky. For fast moving PPAs, such as dailies, or trunk/tip builds, it's even
> worse.

I'm not sure that it's possible without privacy issues to reliably track
the number of users of a daily PPA, particularly since update-manager
now only pops up once a week for that sort of update. The raw download
numbers will certainly allow comparisons with other PPAs, though.

> 2/ Distribution release used
>
> this should be easy. I also find this info very valuable, as there's no point
> spending time maintaining backports for a distro used by no one.
> It should probably not be based on the indexes stats, as it's possible to have
> multiple versions of the same repository, esp. when a new ubuntu is released,
> PPAs maintainers often take time to start producing debs for the new version
> (debs are not copied like in the real archives).
> It should come from the download stats, aggregated by package numbers.

Fairly easily done, and rather important. But there are complications --
a particular binary package may live in multiple distroseries, and the
apt User-Agent doesn't include the distroseries, just the apt version.

...

Read more...

Revision history for this message
Michael Bienia (geser) wrote :

On 2009-08-22 14:48:58 -0000, Fabien Tassin wrote:
> In the meantime, what about giving the PPA owners access to their raw logs,
> properly anonymized, for ex by md5-ing IP addresses? The privacy risk will be
> the same as with popcon (i.e. if there's just 1 user for a given package, it's
> safe to assume it's the PPA maintainer, making him a target), but given a md5,
> finding the IP to exploit is, well, you know..

MD5-ing (or any other hash) the IP address doesn't anonymize it properly
and it can be easily undone. You need just a big table with md5sum to IP
address. And that table isn't even that big as there are only 2^32 IPv4
addresses at all and you need only 128bit for the md5sum and 32bit for
the IP address. That's 20 bytes per record and only 80 GB for the whole
table. And that's only the naive approach for the needed space (more a
upper bound).

Revision history for this message
Jonathan Lange (jml) wrote :

I just had a chat about this bug on #launchpad-dev with wgrant & noodles.

It seems that all that's left is:
 * a small amount of backend work
 * to figure out how best to display the stats on the UI
 * to display the stats on the UI

The backend work involves:
 * PPA versions of cronscripts/parse-librarian-apache-logs.py and lib/canonical/launchpad/scripts/librarian_apache_log_parser.py
 * A table analogous to LibraryFileDownloadCount

wgrant assures me that this is not much work.

Once that's done, I suggest that we add a UI early in the development cycle so that it functions on edge. We shouldn't spend too much time figuring out what's best, instead we should put _something_ up and ask for feedback from interested users. Because this is a popular feature, it would be worth blogging about it and posting to launchpad-users.

Then, we should act on that feedback before the production rollout.

Revision history for this message
Andrew Ross (rockclimb) wrote :

I'd find this kind of thing really useful. I'd be happy to do some bulk data analysis to see if there is a way we might reliably compute extra useful stats. Things like the distro release used should be fairly straightforward given we also see the requests for the Release files.

In terms of anonymisation, as has been pointed out, hashing an IP address doesn't work. IP's could be replaced by their Autonomous System number quite easily, and that might still enable some questions to be answered.

Revision history for this message
Andrew Ross (rockclimb) wrote :

I meant to say, the other anonymisation option is to use a keyed hash. Either the key is kept safe, or if you only need to be able to correlate IP's within a single data set which is being created, then delete the key once you're done.

Revision history for this message
Jo Shields (directhex) wrote :

Okay, looking at LibraryFileDownloadCount, and as someone who runs a moderately successful (>10k users per month at its peak) PPA, I have the following thoughts:

There are two types of data to record - the number of people downloading individual packages (which PPAFileDownloadCount would adequately cover), and the number of people "subscribing" to a PPA by adding it to sources.list (a different measure entirely).

To track "subscribers", the easiest mechanism is to count the number of unique hits on Packages.gz in a given time period, on the understanding that the numbers will be flawed as some people have multiple machines behind a single IP (NAT, mirrors) and others have multiple IPs for a single machine (dynamic IP).

It would be interesting in any stats UI to see those stats divided by architecture.

Whilst I would be satisfied with something simple like "X people subscribed to this PPA in March" being shown, there are many interesting ways to graph historic data - for example showing downloads of a single package over time, with markers signifying new uploads of that package. People running a PPA would be able to look at their graphs & correlate them to exciting new versions or well-publicised blog posts.

As a very very rough guide to the data I find interesting, check the (hand-collated) stats on http://directhex.mfgames.com/hardy.html, which is a simple sort | uniq | wc -l job

Revision history for this message
Tom Fogal (tfogal) wrote :

directhex <email address hidden> writes:
> Whilst I would be satisfied with something simple like "X people
> subscribed to this PPA in March" being shown, there are many
> interesting ways to graph historic data [. . .]

Honestly, right now I would be happy just to get the raw data. Even "1
download occurred on February 2nd at 10:33:02; 1 occurred at 10:37:42"
etc. would be a boon.

Our funding is partially based on the number of users of our software,
and I can't track users of our PPA at all right now. I'm currently
estimating this to be 10% of our user base, so I'm hoping it's not a
big deal.

Is there anything us users can do to in any way expedite this process?

Revision history for this message
Julian Edwards (julian-edwards) wrote :

> Is there anything us users can do to in any way expedite this process?

Writing code is the ultimate way to expedite it.

Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

Access to the raw data for PPA owners seems reasonable as a first step.

Revision history for this message
Roger Light (roger.light) wrote :

Would providing access to data via launchpadlib be another fairly straightforward intermediate step?

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Absolutely, and it's what I was about to suggest when you beat me to the punch :)

From jml's comment above:

{{{
The backend work involves:
 * PPA versions of cronscripts/parse-librarian-apache-logs.py and lib/canonical/launchpad/scripts/librarian_apache_log_parser.py
 * A table analogous to LibraryFileDownloadCount
}}}

The only thing missing from here is that we export the new table on the API. For an experienced Soyuz developer all this would be about 3-4 days work total.

If anyone wants to have a go at this I'm very happy to mentor you.

Revision history for this message
William Grant (wgrant) wrote :

So, I spent a few hours over the weekend putting something together on top of the refactoring I did last year. It tracks binary downloads and exposes them in potentially useful (but probably horrifyingly slow) ways through the API. lp:~wgrant/launchpad/export-detailed-binary-download-stats is the last branch in the series at the moment.

Nicolas Palix (npalix)
Changed in herodotos:
importance: Undecided → Wishlist
status: New → Invalid
Changed in coccinelle:
importance: Undecided → Wishlist
status: New → Invalid
Revision history for this message
William Grant (wgrant) wrote :

The DB patch and model additions for tracking binary download counts landed this morning. The collection script and API export of the raw values will land in the next few days. Then we need to work out how to display it nicely.

Changed in soyuz:
status: Triaged → In Progress
assignee: nobody → William Grant (wgrant)
Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

Very nice to see that, William! This will be a very popular landing :-)

Revision history for this message
Steven Sproat (sproaty) wrote :

Cool, looking forward to it!

Revision history for this message
William Grant (wgrant) wrote :

It's all landed, and the code has been on production for a couple of weeks. I'm just waiting for the counter script to be set up, which will hopefully happen in the next day or two...

Revision history for this message
Tom Fogal (tfogal) wrote :

Sounds like this is setup and running, but I'm not seeing anything on my PPA page, nor when viewing package details.

When this is in production, where will we see it? Or should we expect to use the LP API only to obtain this information, at least in the short term?

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Unfortunately there were some problems with resource usage so the script that scans the logs had to be stopped. William is aware of this and will no doubt have it fixed in due course.

When it does get into production, it's very likely that the initial way of getting the data will be via the API. UI changes will come a bit later.

Revision history for this message
Alwin Garside (yogarine) wrote :

Anything new to report? I'm really looking forward to this.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On Wednesday 21 July 2010 14:17:43 you wrote:
> Anything new to report? I'm really looking forward to this.

We're about to get the server job up and running to scan the Apache logs.
I'll report back when it's ready to use.

Revision history for this message
Jonathan Thomas (jonoomph) wrote :

Hi Julian,
It's been about 2 months since your last post on this issue. Any updates for us anxiously awaiting PPA statistics?

Revision history for this message
Julian Edwards (julian-edwards) wrote :

We're still doing testing but the fix is imminent. I'll let you know when
it's been released properly.

Revision history for this message
alecive (alecive) wrote :

Hi Julian,

Since has been a long time (only two month actually.. :) ) from your last post, I'm wondering if you're going to release this so useful stuff.. :D

Another question: how can I know if you have released it?

Thanks!

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Hi. There were some serious complications with the script that was processing
the stats (basically, it was eating all the memory on the PPA host server).
We've put some fixes in and they've not worked as well as we'd like so there's
still some work to do. Unfortunately other things have taken priority
recently but sorting out this script is the next thing on my list, so I hope
to get it done soon. I'll be posting back here when it's released. (You'll
see the status change)

Changed in soyuz:
assignee: William Grant (wgrant) → Julian Edwards (julian-edwards)
Revision history for this message
Julian Edwards (julian-edwards) wrote :

I've fixed all the problems with the stats scanner, we should be able to put it live after the release next week.

Changed in soyuz:
status: In Progress → Fix Committed
milestone: none → 10.12
Revision history for this message
alecive (alecive) wrote :

Great!

Thanks a lot! :)

Revision history for this message
Nicolas Palix (npalix) wrote :

Hi,

On Thu, Dec 2, 2010 at 11:08 AM, Julian Edwards
<email address hidden> wrote:
> I've fixed all the problems with the stats scanner, we should be able to
> put it live after the release next week.
>

Great. Thanks.

Is there something to do to enable it ?
Where will the stats be available ?

> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/139855
>
> Title:
>  Display stats about PPA usage
>

--
Nicolas Palix
http://proton.inrialpes.fr/~npalix/

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Hi everyone, wow, there's a lot of subscribers on here.

This bug is about to be marked "Fix Released" but it will take quite a long time for the log scanner to catch up to current times. Please set your expectation levels appropriately! I've no real idea how long it will take to catch up but I guesstimate a few days. I will post back here again when I can see that it's up-to-date, so please don't add comments requesting more information.

To use the stats when they become available, you need to use the Launchpad API (See https://help.launchpad.net/API and https://launchpad.net/+apidoc).

There are three API methods on the binary_package_publishing_history object (see https://edge.launchpad.net/+apidoc/devel.html#binary_package_publishing_history):

 * getDailyDownloadTotals
 * getDownloadCount
 * getDownloadCounts

To repeat - these will NOT return any useful data yet. I will post back here when you can rely on the information.

Thanks for your patience everyone.

Changed in soyuz:
status: Fix Committed → Fix Released
Revision history for this message
Yury V. Zaytsev (zyv) wrote :

Wait, does this actually mean that there is not going to be any user-visible download counters and such? I think some kind of rudimentary UI is absolutely necessary...

Revision history for this message
William Grant (wgrant) wrote :

Getting it out over the API is just the first step. There will be UI, but we first have to work out how to usefully display it. Any ideas?

Revision history for this message
Robert Collins (lifeless) wrote :

We should reopen this I think - it was filed requesting display of stats, which we haven't yet achieved yet.

Revision history for this message
Yury V. Zaytsev (zyv) wrote :

This was exactly why I asked since this bug is no "Fix Released"...

Revision history for this message
Jo Shields (directhex) wrote :

I don't have any data for my PPA yet, so I can't be sure this is correct, but it SEEMS to do what it should, minus any numbers yet:

from launchpadlib.launchpad import Launchpad
cachedir = "/home/jms/.launchpadlib/cache/"
launchpad = Launchpad.login_with('badger', 'production')
badgerppa = launchpad.me.getPPAByName(name='ppa')
desired_dist_and_arch = "https://api.launchpad.net/1.0/ubuntu/lucid/amd64"
for individualbadger in badgerppa.getPublishedBinaries(status='Published',distro_arch_series=desired_dist_and_arch):
        print individualbadger.binary_package_name + "\t" + individualbadger.binary_package_version + "\t" + str(individualbadger.getDownloadCount())

Edit to suit your desired PPA name, release, etc. It'll give a count for total downloads of all currently published packages for amd64 lucid (counts are per-release, per-version and per-arch).

Julian, do you have an example of a PPA with valid data which we can poke, pending a full run of the log scanner?

Revision history for this message
Julian Edwards (julian-edwards) wrote :

@Rob and Yury: The description clearly says "There should be a way to see the number of people using a PPA." While I agree a web UI would be nice, the API UI meets the goal of this bug and is a lot easier to make available for the people who were clamouring for this feature. We'll add a web UI later - if you want to file a bug about that I will triage it.

@directhex: As I previously said, please wait until I post back saying that the data is ready.

Revision history for this message
Nicolas Palix (npalix) wrote :

 @directhex: I think it should be "edge" instead of "production".
Moreover, it seems it is querying
the 1.0 API version. How should we say to launchpadlib to use the devel one ?

Revision history for this message
Nicolas Palix (npalix) wrote :

On Thu, Dec 9, 2010 at 2:55 PM, Nicolas Palix <email address hidden> wrote:
>  @directhex: I think it should be "edge" instead of "production".
> Moreover, it seems it is querying
> the 1.0 API version. How should we say to launchpadlib to use the devel one ?
>

launchpad = Launchpad.login_anonymously('PPA stats', 'edge', cachedir,
version= "devel")

Should the URI of desired_dist_and_arch be ajusted accordingly ?

--
Nicolas Palix
http://proton.inrialpes.fr/~npalix/

Revision history for this message
Alin Andrei (nilarimogard) wrote :

This works:

from launchpadlib.launchpad import Launchpad
cachedir = "/home/andrei/.launchpadlib/cache/"
launchpad = Launchpad.login_with('ppastats', 'edge', cachedir, version='devel')
badgerppa = launchpad.me.getPPAByName(name='webupd8')
desired_dist_and_arch = "https://api.edge.launchpad.net/devel/ubuntu/maverick/i386"
for individualbadger in badgerppa.getPublishedBinaries(status='Published',distro_arch_series=desired_dist_and_arch):
       print individualbadger.binary_package_name + "\t" + individualbadger.binary_package_version + "\t" + str(individualbadger.getDownloadCount())

But there's very few data available for now...

Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

Julian, w.r.t. visualising this, could I propose a simple "heat" metric:

5 flames: top 1% PPA in the past 6 months
4 flames: top 10%
3 flames: top 25%
2 flames: top 50%
1 flame: top 75%
damp squib: less.

Just count all downloads across all archs and releases equally, but only
count them for the latest version of any package in there, so you can't
fudge it by rapid-fire publishing.

Mark

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Thanks for the suggestion Mark. Everyone, I've filed bug 688141 about this, please feel free to add suggestions on it and I'll probably break it up into smaller more manageable bugs over time.

Revision history for this message
alecive (alecive) wrote :

I did the code posted by Alin Andrei. It works, but I've some questions:

1. desired_dist_and_arch = "https://api.edge.launchpad.net/devel/ubuntu/maverick/i386" -> here you specify the distro and the architecture. What to do if I want to see all distro and architectures in one shot?

2. individualbadger.getDownloadCount() returns the actual number of persons that use the specific ppa right? If yes, why if I ask for example the counter for ubuntu lucid 1386 of my ppa, it returns 7, even if I know that at least this ppa has 10 subscribers? I know it because they are mine computers, or friends close to me, and I manage these computers! Should I do something else or wait other days (as Julian wrote in post #33)?

3. I installed the api with "sudo apt-get install python-launchpadlib".. Maybe I have to install the latest launchpadlib (available with bzr)?

Revision history for this message
Alin Andrei (nilarimogard) wrote :

"getDownloadCount()" gets, like its name says, the number of downloads for a package. There is no way to get the actual number of people that use a PPA, just the number of times some packages in the PPA have been downloaded (either externally or by adding the PPA).

Revision history for this message
Alex Mandel (wildintellect) wrote :

If you want to see all, just leave off that variable, it's optional. The number are not up to date, so you'll need to keep waiting for them to propogate. #33 states they have no idea how long it will take and that a post will occur here.

I've put together a python script based on what was posted here that demonstrates how to get a ppa by team as opposed to by person and how to wildcard search for a specific set of packages of any distro and arch. All of this is in the online api documentation.
http://bazaar.launchpad.net/~wildintellect/%2Bjunk/launchpadapi-examples/annotate/head%3A/ubuntugis-qgis-stats.py

Revision history for this message
alecive (alecive) wrote :

Thanks both for the reply:

@alin -> ok for this, but it might be natural that the number of ppa subscribers is at least minus or equal to the number of downloads: if somebody subscribe himself to a ppa, it's obvious that he'll download at least once the software inside it!

@alex -> this replied to the answer above: since the number are not up to date, now using this is quite useless because numbers are not correct. Thanks also for the notification about the desired_dist_and_arch variable.

Revision history for this message
Alin Andrei (nilarimogard) wrote :

@alecive: packages can be downloaded from a PPA without adding it.

Revision history for this message
alecive (alecive) wrote :

yes, so the download count returns a number that is in any case greater than the number of ppa subscribers.

Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

The number of downloads from a PPA is an awkward indication, because
it's affected by:

 - the number of subscribers
 - the number of packages
 - the number of uploads of those packages

What people *typically* want to know is "how many people are getting
this app from that PPA". We'll have to do a little magic to get good
estimates of those numbers from the raw data we have.

Mark

Revision history for this message
Jo Shields (directhex) wrote :

On Fri, 2010-12-17 at 18:55 +0000, Mark Shuttleworth wrote:
> The number of downloads from a PPA is an awkward indication, because
> it's affected by:
>
> - the number of subscribers
> - the number of packages
> - the number of uploads of those packages
>
> What people *typically* want to know is "how many people are getting
> this app from that PPA". We'll have to do a little magic to get good
> estimates of those numbers from the raw data we have.

I run a reasonably popular third party repository (not as a PPA, due to
lack of download metrics), and have always found the number of
subscribers to be a relatively simple estimate for popularity - although
that might be because the packages I offer are unrealistic to download
one-by-one rather than by adding to sources.list. It's good enough as a
rough cut, anyway - it's nice to say "tens of thousands of unique IPs
have this repo in their sources.list".

It's also a measure which ignores multiple uploads of the same package
(which might be unhelpful if you iterate a lot), and just looks at the
repo as a whole. And all it needs is the number of unique hits on
Packages.bz2/gz.

Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

On 18/12/10 10:12, directhex wrote:
> It's good enough as a rough cut, anyway - it's nice to say "tens of
> thousands of unique IPs have this repo in their sources.list".

Right - we can / should have this data too, the number of unique IP's.
Note that we don't consider that a very good estimate of the number of
subscribers, because of dynamic IP assignment and network address
translation. If we publish this, I would keep it down to the number in
the past 24 hours, assuming we configure Ubuntu to hit the repo's once a
day (anybody got clarity on the default in that regard?).

Mark

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Dear all, here's a New Year gift for you. The stat collector finally caught up so the figures should all be correct now. Please file additional bugs if you think there's a problem.

Revision history for this message
William Grant (wgrant) wrote :

Due to a couple of remaining bugs, several days' logs are still unable to be completely processed. But the vast majority of the data is there now, and the rest should follow once the bugs are fixed.

Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

Happy New Year, PPA'ers :-)

Revision history for this message
Darxus (darxus) wrote :

This stuff is amazingly badly documented. This is what I ended up doing to get the hit counts for the packages in the first ppa (number 0) belonging to the spamassassin group:

apt-get install ubuntu-dev-tools
lp-shell
sa = lp.people['spamassassin']
ppa = sa.ppas[0]
desired_dist_and_arch = "https://api.edge.launchpad.net/devel/ubuntu/maverick/i386"
for binarypackage in ppa.getPublishedBinaries(distro_arch_series=desired_dist_and_arch):
         print binarypackage.binary_package_name + "\t" + binarypackage.binary_package_version + "\t" + str(binarypackage.getDownloadCount())

Once you're in lp-shell, and have ppa defined, you can just type "sa.ppas[0]" to get it to tell you which ppa corresponds to that number, so you can look through them to find the right one:

>>> sa.ppas[0]
<archive at https://api.edge.launchpad.net/1.0/~spamassassin/+archive/spamassassin-daily>

Also, very importantly, I could only get lp-shell working on my local machine running X with a graphical browser. The authentication stuff through the text browsers lynx and links didn't work.

Is it really so hard to just put these hit counts on a freaking web page somewhere?

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On Friday 12 August 2011 21:08:41 you wrote:
> Is it really so hard to just put these hit counts on a freaking web page
> somewhere?

Launchpad is open source, patches welcome.

Revision history for this message
Mark Harman (mark-harman) wrote :

Any progress on implementing this feature (i.e., "Display", not simply as part of an API)?

Regarding using the API, I've tried a python script I found ( http://www.webupd8.org/2010/12/launchpad-finally-gets-ppa-usage-stats.html ) and the method posted by Darxus above. They both give me the error:

distro_arch_series: No such object "https://api.edge.launchpad.net/devel/ubuntu/maverick/i386".

Any ideas? Seems I'm not the only one - https://lists.launchpad.net/openshot.developers/msg07385.html .

Revision history for this message
Martin Erik Werner (arand) wrote :

@Mark
The script posted at that mailing list have a typo s/1386/i386/ which explains the error received.

Your message seems to indicate that this is not the case, however...

I've cooked up my own kludgy script for this, taking owner and ppaname(s) as arguments, I'm not sure if it might work better...

Revision history for this message
Mark Harman (mark-harman) wrote :

Hi,

Thanks for that - though I still get the same problem. I tried:

python ppastats.py mark-harman conquests

And got:

distro_arch_series: No such object "https://api.edge.launchpad.net/devel/ubuntu/hardy/i386".

I don't know if it's a problem with my script/setup or my PPA. Is it possibly to read other people's PPA stats? If so, maybe someone could try it for my PPA - or alternatively, if someone has a PPA that they know this works for, I can try running the script on that one instead?

Revision history for this message
Martin Erik Werner (arand) wrote :

On 29/09/11 00:09, Mark Harman wrote:
> I don't know if it's a problem with my script/setup or my PPA. Is it
> possibly to read other people's PPA stats? If so, maybe someone could
> try it for my PPA - or alternatively, if someone has a PPA that they
> know this works for, I can try running the script on that one instead?

Hmm, I poked at it repeatedly.
It seems the issue is down to the usage of edge vs production (as is
indicated by the deprecation warning), however it seems since I had my
edge login "cached" since before (and my profile set to edge
participation), it wasn't causing an issue for me.

I've updated the script to use production instead, which worked on a
virtual machine, which prior saw the issue you describe.

@Mark, does this version work for you?

Revision history for this message
Alin Andrei (nilarimogard) wrote :

I'm also uploading a script I modified a while back, based on the initial script posted here.

Usage: python ppastats.py TEAM PPA DIST ARCH

Example: python ppastats.py webupd8team y-ppa-manager maverick amd64

Revision history for this message
Björn Michaelsen (bjoern-michaelsen) wrote :

using https://bugs.launchpad.net/launchpad/+bug/139855/+attachment/2479098/+files/ppastats.py from comment 63 unfortunately creates a lot of bogus output (lots of lines with just "0") for copied packages, however that might be a but in launchpad itself.

See for example on:
python ppastats.py libreoffice ppa precise i386

which should show numbers for libreoffice 3.5.4~rc2-0ubuntu1.

Revision history for this message
Bernmeister (thebernmeister) wrote :

FYI: I've created an application indicator to track my PPA downloads - I intended it only for my use, but others may find it interesting.

https://launchpad.net/~thebernmeister/+archive/indicator-ppa-download-statistics

Revision history for this message
Martin Erik Werner (arand) wrote :

Updated my command-line script a bit, should be prettier now.

Revision history for this message
Bernmeister (thebernmeister) wrote :

FYI: I've moved all the indicators I've done (including indicator-ppa-download-statistics) under one PPA located at https://launchpad.net/~thebernmeister.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.