Dodecahedron Books: Book Statistics Corner, Part 3 – The Decay Curve of a Book Series

These days, a lot of writers are doing series. There are good reasons for that - once you have built up an audience for a certain setting, cast of characters and genre, you would like to maintain that audience. It seems natural that a series would be the way to go. But what might you actually expect from a series? For example, how many people will move on from book 1 to book 2, book 2 to book 3 and so on? It seems likely that you will lose some people along the way, but is there a pattern to that? To get a feel for this, let’s look at some results for some well known long running book series. Naturally, we can only look at a few “ideal type” cases, but with luck that will give us some insights that are typical for most series.

First, we will look at Patrick O’Brian’s Aubrey/Maturin historical fiction series. That is the series of books that the recent (released in 2003) movie “The Far Side of the World”, featuring Russel Crowe, was based on. They are about a Royal Navy captain and a ship’s doctor/spy, set during the era of the Napoleonic wars, roughly 1800 to 1815. Why did I pick this series first?

· I have read them all, so I have a good sense of how the series evolved.

· It’s a long series (20 books) so it really tests the idea of loyalty to a series.

· It has sold a lot of books, and has had a lot of fans, so the statistical power of the analysis should be high (that just means that the numbers are big, so the results are probably grounded in some underlying realities, not random noise).

· It has spanned the era of print books sales in book stores to the era of ebooks sales in on-line stores, so we might be able to see whether the change in how stories are stocked and delivered has affected how people consume series.

To begin, we will look at how the series did via a number of Goodreads measures. As you may know, Goodreads is a site where readers can leave reviews, ratings, and recomendations of the books they have read. The measures that we will look at are Numbers of Reviews, Numbers of Raters, Average Rating, and Number of Editions. The results, as taken from the Goodreads website are shown below. The year that the book was first published is also shown, to give some idea of the time scale involved. There’s a reason the first four books are highlighted, which we will get to later.

Book Num	Title	GR Reviews		GR Ratings	GR Rating	Edi-tions	First Pub
1	Master and Commander	1,700	21,161		4.08	90	1969
2	Post Captain	459	8,760		4.29	63	1972
3	HMS Surprise	319	7,879		4.40	54	1973
4	The Mauritius Command	224	6,885		4.32	53	1977
5	Desolation Island	233	6,307		4.35	50	1977
6	The Fortune of War	170	5,855		4.35	43	1978
7	The Surgeon's Mate	152	5,575		4.35	40	1980
8	Treason's Harbour	110	5,217		4.35	39	1980
9	The Ionian Mission	137	4,390		4.28	41	1981
10	The Far Side of the World	156	5,473		4.41	48	1984
11	The Reverse of the Medal	126	4,221		4.38	40	1986
12	The Letter of Marque	119	4,772		4.43	36	1988
13	The Thirteen Gun Salute	125	3,920		4.35	36	1989
14	The Nutmeg of Consolation	114	4,113		4.37	39	1991
15	Clarissa Oakes/The Truelove	107	3,778		4.33	35	1992
16	The Wine-Dark Sea	102	3,709		4.36	34	1993
17	The Commodore	99	3,626		4.37	38	1994
18	The Yellow Admiral	100	3,813		4.32	36	1996
19	The Hundred Days	93	3,327		4.31	32	1998
20	Blue at the Mizzen	128	3,213		4.34	41	1999
		4,773	115,994		4.34	888

As you can see, for most measures there was a fairly steady decline from Book 1 to Book 20, though some books in the latter part of the series seem to have done better than the book that immediately preceded it - in mathematics, we would say that it is not a monotonic series, but in statistics we might say that it comes pretty close to one (it is quite well modelled by a power law, in fact).

The data is graphed above, with the various measures (Number of Editions, Number of Goodreads Ratings, and number of Goodreads Reviews) scaled in such a way that the measures for the first book are assigned the value of 100, and the measures for books after that are assigned numbers proportional to that initial value. So, for example, the first book had 90 editions printed, while the second book had 63 books printed. In our scaled variable we have assigned 100 to the first book, and 70 to the second book (63/90 = 0.70, so the second book is given the value 70). The reason for using these scales (it’s called normalizing) is so that we can compare the three line graphs on the same scale.

There are a lot of interesting results here. First off, we see that all of the graphs decline steadily (each shows a decay curve), but they fall off at different rates. The fall-off for the number of editions is slowest. That’s interesting, since the number of editions is probably the measure that best tracks the number of books sold and read. After Book 5, the number of editions printed falls to about 40% to 50% of the number of editions printed for the first book. So, Patrick O’Brian appears to have held on to about half of his initial book purchasers as the series matured. There was an uptick at Book 10 - that’s “The Far Side of the World”, which was also the title of the movie starring Russell Crowe. So, that clearly seems to have given the book a bounce.

There was also an uptick for the final book of the series “Blue at the Mizzen”. A reasonable hypothesis is that those extra editions may represent sales to people who followed part of the series and dropped out, but who might have decided to buy the final book to see how it turned out. However, in some ways Book 19 was really the end of the series (Napoleon is defeated) and book 20 could be thought of as the start of another series that featured the same main characters in a different setting (the plot moves from the Napoleonic wars to the wars of liberation in South America). But the author died shortly after Book 20, so there was no chance for a “next generation” follow-up. So the final book uptick might be related to people buying into a new series or it might be related to the wrap-up of the original series. We’ll never know.

There are some other interesting features of these decay curves - first, how the decay curve of Goodreads ratings falls off more rapidly than the decay curve of the number of editions and secondly how the line representing the number of Goodreads reviews falls off even more sharply. So, it appears that people might be less willing to invest the time and energy into rating or reviewing books that they read in a series, as the series goes on. Also, it appears that they are more willing to invest the time in a rating than in a review. That makes sense, as a rating only takes a few seconds, while a review can take five or ten minutes - even much longer than that, for those who take their reviewing very seriously indeed.

One other interesting aspect of the decay curves is that they are well modelled by our old friend the power law, of which I have written previously. The fitted lines next to the jagged data lines are these power functions. The R-Squared values next to the respective lines indicate that the fits are quite robust, in a statistical sense (an R-Square of 1.00 would indicate that the data fit the power-law function perfectly, so values in the 0.85 to 0.95 range are really quite good fits.

The above data also shows that after the first few books, the number of reviews and rankings correlated rather nicely with the number of editions of the book. Since we assume that the number of editions printed correlates fairly well with the number of copies sold, we can therefore have some more confidence in the notion that the total number of reviews a book gets scales fairly well with its total sales. This assumes, of course, that each edition had more or less the same number of copies printed and sold. This can be seen in the graph below. Note, however that for the initial books in the series, the number of reviews was higher than would be expected from the relationship in the graph. Again, this indicates that people may be more enthusiastic to review/rank near the beginning of a series than later on.

It is also worth noting how the average rating of the books went, as the series progressed. As you can see, the first couple of books actually had the lowest rating, and after that the ratings were quite consistent, at a bit under 4.4, for the most part. So, it would appear that as the series went on, the readers who dropped out were (not surprisingly) those who were less satisfied with the books, and the readers who stayed with the series were those who were more satisfied. So, the audience was smaller, but more loyal as the series continued.

Now lets have a look at the Amazon Kindle numbers for Patrick O’Brian’s Aubrey/Maturin Series. In this case, we will look only at the trend from Book 5 to 20, since the publisher has not yet released the first four books in ebook format. I suppose they are going with the reasoning that ebook sales could “cannibalize” print book sales, but they are only concerned about that happening to the earliest books in the series. I don’t know if that logic is still valid, but traditional publishers seem to be holding on to it in this instance. We are also going to assume that the number of Kindle reviews correlates reasonably well with Kindle sales - i.e. a book with twice as many reviews as another, probably sold twice as many copies, at least to a first approximation.
As we can see, the number of reviews in the Kindle store do not show the decay curve pattern that was evident in the Goodreads data, which was probably primarily based on legacy print book sales. In the Kindle store, the last two books had about as many reviews as the first two, and the others had 50 or more reviews, compared to the 85 or 90 for the top reviewed volumes. So, perhaps the always available nature of the ebooks in the Kindle store has altered the underlying sales dynamics of the series. Of course when it comes to ebook sales for books published before 2000, we are always looking at the “long tail”, so we might be seeing the dynamics of the long tail, which are generally thought to be underlain by a much flatter power law than initial book sales.

These are all good things to keep in mind when you evaluate the success of your own series, if you are a writer or publisher, especially if you are a self-publisher or small scale publisher. To summarize:

· There is a power-law like decay curve (in sales and other measurs), or at least there was in the legacy system.

· The slope of that curve varies depending on the measure, with the tendency to rank or review probably falling off faster than sales, as the series goes on.

· If your book gets made into a movie, you will most likely get a bump in sales J.

· There may be a bump at the end of a long series, as people who dropped out of some of the middle books come in to see how things turned out.

· Numbers of reviews and or numbers of rankings (not average rankings) probably scale reasonably well with sales.

· The dynamics of print book series and ebook series may be quite different, with the ebook series possibly having a much flatter decay curve (or none at all).

Well, that’s just one series (a highly successful one) whose sales dynamics we have attempted to infer from Goodreads and Amazon Kindle data, available to the public. In later blogs we will see whether these results hold for some other series in other genres, such as Robert Jordan’s Wheel of Time series or J.K. Rowling’s Harry Potter series. We will also try to test some recent ebook only series, to see if the dynamics of those are different.

Dodecahedron Books

Friday 20 June 2014

Book Statistics Corner, Part 3 – The Decay Curve of a Book Series

No comments:

Post a Comment