Friday 28 March 2014

Measuring the Top 100 Selling Kindle Books - Annual Sales vs Point-in-Time Snapshots

Amazon Top 100 Kindle Books, Indies versus Trads Sales Revisited Part 2 - Explaining how Daily Snapshots can Differ from Annual Rankings

In a recent blog, I did some comparisons of my analysis of Amazon’s Top 100 Kindle eBooks of 2013, versus the data recently released by noted SF writer Hugh Howie and his (currently unknown) data guru.  They analysed a number of snapshot datasets, collected from Amazon’s website via a web “spider”, which can data mine publicly available internet data extremely quickly and efficiently.  They have now released datasets of increasing size (the latest included 50,000 books) and have delved into books outside the genre categories.  Those blogs of mine can be found under the general titles “Amazon Top 100 Kindle Books” in the Dodecahedron Books blog site.  Hugh Howie’s can be found in the website “Author Earnings”.
One key difference between my analysis of the Amazon Top 100 eBooks of 2013 and the Howie/DataGuru analysis concerned the proportions of traditionally published books versus Indie books that were in the top 100.  Though my original analysis was surprising enough in its estimate of the penetration of Indies in the Amazon best-sellers, the Howie/DataGuru data was even more favourable to Indies.  The tables below recap those results, updating them with Howie/DataGuru’s most recent findings.

Here’s my result for percentage of Indie vs Trad books in the Top 100, along with the new results reported by Hugh Howie (next table).

Amazon Top 100, 2013
Total
Traditional
76%
Indie
24%
Grand Total
100%

These are Hugh/DataGuru’s numbers from the 50,000 book sample.  I have added his “From Small or Medium Publisher”, “Big Five Published” and “Amazon Published” together, to be equivalent to my “Traditional” category.  Similarly, I have added his “Indie Published” with “From Uncategorized Single-Author Publisher” together to be equivalent to my “Indie” category.

Hugh Howie’s Amazon snapshot, early 2014
Total
Traditional
64%
Indie
36%
Grand Total
100%

Why are the results different?  Why do Indies account for 36% of Hugh Howie’s Feb 7, 2014 snapshot, but only 24% of the 2013 Amazon Top 100, by my count?
As I mentioned in an earlier blog, one possibility is simply that a lot changed between the times that the two samples represent.  To recap that blog:
“ My Amazon Top 100 analysis was based on Amazon’s list of their top 100 books of 2013.  In a sense then, it could be thought of as representing the mid-point of the 2013 data, since it is an accumulation of data collected throughout the year.  Hugh’s analysis was from a snapshot in February 2014…about 8 months passed between the mid-point of one sample and the time of the second.  In the current publishing world, a lot can change in 8 months, as we know.”
I also noted a second possibility, which I will explore below.  To recap that blog:
“The second possibility is that the traditionally published books in the top 100 were more consistently present in that list over a longer time period, whereas any particular Indie book spends less time in the top 100, to be replaced by a new Indie book… there is more “churn” in the Indie books than the Trads….because the Trad authors have had longer careers and therefore have a ready-made fan base that allows [any particular trad title] to stick on the top of the list for a longer time.   Indies have a more experimental audience, so any particular book doesn’t stay at the top as long, though as a group they are very successful .”
To explore this possibility, I constructed a model set of 200 books in Excel, which could be split into two groups:
·         “Non-Stickers”, who sold between a lower and upper limit of copies of books each time period (a randomly generated number, between 10 and 1000 per month).
·         “Stickers”, who sold between a lower and upper limit of copies of book each time period, but had a slightly higher number for the lower limit, which could be varied (a randomly generated numbers between a variable lower limit and 1000 copies per month).
 
I then generated twelve months of artificial data, showing the percentage of books that were “Non-Stickers” each month versus the percentage that were “Stickers”.  Note that the “Stickers” have a slight edge in book sales in the non-control scenarios, but only a slight edge.  There were ten trials performed under each set of assumptions, to ensure that the random number generator resulted in a  good representation of the underlying statistical assumptions (i.e. utilizing the Central Limit Theorem aka the Law of Large Numbers, which simply means that as you do more trials your results will become closer and closer to the theoretical assumptions in your model). 
The first two graphs show the results of having a dataset of 64% “Stickers”/36% “Non-Stickers”, with each group randomly selling somewhere between 10 and 1000 books per month.  I chose the 64/36 ratio, because that is the proportions of Trads to Indies in Hugh Howie’s dataset of 50,000 Amazon books.  This is the control scenario, where Stickers and non-Stickers sell the same number of books per month, on average.  That would be 505 books each, the result of a uniform random number generator, that picked a number between 10 and 1000 each time, with each number having the same probability of being chosen.
 As you can see, in this scenario, the average of the twelve monthly snapshots is almost exactly the same as the cumulative annual measure.  That is, each month about 64% of books in the top decile of the sales rankings were from the stickers, which is also their percentage of the overall population of books.  Their percentage of books in the top decile in the annual rankings is also 64%.
 


I then varied the lower limit of books sold for the “Stickers”, raising it slightly with each model run, while keeping it the same for the “Non-Stickers”.  The results of half a dozen model runs are shown below, varying the lower limit each time.  As you can see, the Howie/DataGuru results are reproduced when the “Stickers” have a lower bound of about 60 sales per month.  That would imply an average of about 530 books per month, to the Indies average of 505 books per month.  It corresponds to a difference that hardly shows up in the monthly data, but is very noticeable in the annual data.


The exact numbers for the six model runs are shown below, along with a graph of the results.
 
Lower Bound
Upper  Bound
Annual, Top Percentile
Monthly, Top Percentile
10
1000
64%
64%
50
1000
68%
65%
62
1000
76%
67%
75
1000
79%
67%
100
1000
81%
67%
125
1000
90%
67%
 
So, projecting these results into the Trad/Indie results, it is clear that if the Trad published books  tended to be only a little more consistent in their monthly sales results, they could quite easily have about 76% of the books in the Amazon Top 100 for the Year 2013, but only about 64% in a daily snapshot early in February 2014.
Obviously, this exercise doesn’t prove that this is what happened, but it does show that it is quite plausible.  Furthermore, if the “stickiness factor” isn’t related to publisher category, but rather to length of time that a writer has been in the public eye, then this Trad/Indie difference will wither away, as Indies have more time to establish themselves in the marketplace.


 

No comments:

Post a Comment