The Panic About Kindle’s Text to Speech: Still Silly

This article attempts to explain why my and some other authors’ sanguine attitude toward the new Kindle’s Text-to-Speech capability is misguided (or more, “right response, incorrect reasoning”); in essence the argument is that we’re only looking at how computerized voice reading sounds now, as opposed to how it will sound in the future, when it’ll be easy to instruct computers how to do inflections and all that.

This is a nice try, but, no.

1. First, on a personal, your mileage may vary note, it seems to me that people generally buy the audio version of a book or the text version, rather than both; personally speaking, as a writer I don’t generally expect someone to buy more than one version of my work in any event. So the “Oh noes! Since they have the Kindle version, they won’t buy the audio version!” concern is, shall we say, not high on my list of things to worry about.

2. Has it escaped the general notice of folks that the same company that is putting out the Kindle is also the same company that owns Audible.com? Yes, Amazon owns both, and I don’t really see the company trying to put one section of itself out of business with the other. Indeed, one of the things I would very surprised not to see at some point in the near future is Amazon doing a Kindle/Audible bundle: Say, buy the Kindle version of Zoe’s Tale and they’ll throw in the actual audiobook version for $10 or so, which would make the whole package about the same cost as a hardcover. Then if Amazon is actually really smart, they’ll find a way to do audio indexing, so you can highlight a word in the text and have the audiobook version pick up right from there. And so on. This works grandly for me, because then I get twice the royalties.

Yes, other eBook reader makers might also make text-to-speech capability, and they aren’t Amazon — but that said, I imagine if Amazon does this sort of bundling, other eBook sellers will find a way as well, and then the field is leveled again.

3. I understand geeks have unlimited faith in their ability to manipulate technology, but developing a computerized audio voice that actually delivers a performance rather than a recitation is not simply a matter of “how to emphasize certain words and phrases, probably through some kind of XML-based markup standard.” This is fairly unsophisticated way of looking at how language works, and in particular how it works in fiction, narrative and exposition. We authors are crafty types and we often use language in unexpected ways, and I doubt very seriously you could create software that would accurately discern correct intonation at all times, or even be able to tell when one person was talking in dialogue as opposed to another.

If you tried to build software that could heuristically appropriately discern what emphasis to put on what words where and when in all cases, as well as being able to differentiate between characters (and their own ways of inflection, intonation, speaking, etc) not only would the code base be HUGE, but in point of fact you would have developed some damn impressive AI, and I for one would welcome our new book-reading computer overlords. If this software couldn’t manage this task completely, or did it imperfectly, you’d be having an audio version of the Uncanny Valley, in which the “almost but not quite” nature of the audio performance would be self-defeating. I’m not sure there’s an interest in doing this in any event from any of the eBook companies, but particularly from Amazon, who has a direct interest in upselling another, superior audio product.

So that’s dealt with. But what if instead of trying to birth a book-reading AI you instead and somewhat more simply created markup related specifically to a work (say, a markup specifically meant to read Zoe’s Tale)? Well, then what you’ve got there is very definitely a derivative work, and you’ll hear from my lawyers. To my mind there’s a substantial difference between a computer voice reading text which a consumer has already purchased, which to my mind is not a derivative work, and a computer voice reading audio under directions specific to a work, which certainly is. Not to mention that this markup would be created by someone who is a computer programmer, whose skills, while no doubt formidable, are likely not to be consonant with the skills required to give a book an audio performance that sounds authentic.

The author of the article linked to above imagines wikis where people write “inflection scripts” for their favorite works, and while that’s certainlypossible, I also suspect the folks who would frequent those wikis are the same sorts who currenly frequent warez sites and the like; i.e., people who don’t buy things anyway and are sufficiently geekoidial that they’re happy to load their own scripts rather than have Amazon (or whichever seller) do it for a relatively modest fee. These aren’t most people, nor will be most people any time soon.

In either case there’s an easier and likely cheaper way to generate an audio file from a book that sounds and “feels” like a human: Give it to an actual human to perform, the performance of which is a derivative work.

In short: I’m not at all convinced that realistic and engaging computerized audio will be possible at any point in the near or even middle future without requiring a clear and obvious derivative work to generate it. When it is possible, I suspect AI will be at a point where it will also be able to generate actual novels, and then, of course, I will retire, to spend my remaining days being pleasured by my sexbots, until they plug me into the mainframe to use my brain cycles for sewage maintenance and I slip comfortably into the hive mind.

Naturally people are free to disagree with me on any of these points; that’s fine. Suffice to say for all the reasons above, I’m not in the least concerned about computerized text readings, in terms of how they affect my career or my rights.

107 Comments on “The Panic About Kindle’s Text to Speech: Still Silly”

  1. You’re not thinking at the lawyer level of things. On the “we’re all in the cooperative frame of mind level,” it’s not a big deal. In the world of who sells rights and who pays for them and who finds creative ways to get out of that last part, it is a big deal.

  2. Mythago:

    Oh, I understand that. But I really just don’t see a personal, non-commercial computerized recitation of a book someone already purchased as representing a loss of rights on my part.

  3. As usual, you’ve hit the coffin with the nail on this argument. Though I bet that is Amazon tries the Kindle/Audible bundle, someone will cry outrage at unfair marketing practices and monopolies.

    I consider computer generated speak to be a stopgap, something useful from time to time, but not something I want to listen to all the time.

    Dr. Phil

  4. Hell, many *people* can’t read text at a level good enough to pay for.

    The telling bit is that while computer graphics have been used to produce games and movies for decades, and now seem to be climbing up the far side of the uncanny valley, no one has even tried to make any sort of serious art with synthesized voices. Hell, people are even used for robot voices.

    The truth is that people are only going to actually use Kindle’s Text-to-Speech when they have no other choice. This means that either an actual audiobook version doesn’t exist, or the person doesn’t have the cash to buy it.

  5. < sarcasm > But Mythago, that ignores § 121! Under § 121, if it has the effect of making things more accessible to those with disabilities, it’s not a copyright violation! < /sarcasm > We’ll just neglect to mention that § 121 applies only to nonprofit organizations that qualify for such treatment. And “nonprofit” doesn’t mean “not making money”…

    There. I thought I’d just get that out there before anyone else does… so that my sarcasm tags get out there, too. (That, by the way, is one of the missing HTML tags that is inhibiting the ‘net’s advancement.) The real problem is not the text-to-speech functionality; hell, that’s built into the latest couple versions of Acrobat Reader. The problem is that Amazon didn’t ask the authors, assuming instead that the publishers were the correct parties to ask. And, of course, the publishers went along with it… because they haven’t yet figured out that they don’t buy rights, but they license them. (Snark about “lawyerly mealymouthed meaningless distinctions” will be gleefully accepted… except for the “meaningless” part.)

    This actually comes down to the fifth fair use factor — the one that’s not in the statute: Administrative convenience. The situation should remind anyone who was paying attention of Rosetta Books — not because of the substance (Rosetta Books was a much-more-egregious power grab by the publisher, who tried to claim that “book” in 1965 necessarily included e-book) but because of the attitude.

    All of the above said, the current technology is pretty pathetic, particularly when it comes to proper names. But it’s not the technology that concerns me; it’s the failure to communicate with the actual rightsholders.

  6. “I suspect AI will be at a point where it will also be able to generate actual novels, and then, of course, I will retire, to spend my remaining days being pleasured by my sexbots, until they plug me into the mainframe to use my brain cycles for sewage maintenance and I slip comfortably into the hive mind.”

    I want to read this novel. ^^^

  7. John, thanks for the link. Long-time reader, first-time commenter.

    FWIW, I never suggested that the “emphasis heuristics” model would come close to the quality of a real voice actor, only that the result — with pretty general rules about pausing, distinguishing tone between different speakers (but not giving them distinct accents or voices), and similar kinds of very basic reading rules — would be good enough for a lot of people, possibly most people.

    In fact, if you take the Uncanny Valley model as accurate, then it might even be more comfortable for the listener for the computerized voice to still sound distinctively digital, rather than almost-but-not-quite-right.

    Your points #1 and #2 above, of course, are what I’d consider right for the right reasons; the issues around the quality and viability of text-to-speech are actually pretty secondary.

  8. Won’t the book reading AI overloards of the future be customers instead of competitors? Or potentially both depending on where their talents lie. After all if is good enough to provide interpetation and perfomance it is good enough to appreciate the book as a work of art. Once AI gets to that level they will functionally be people and either be able to earn their own way in the world or be mechanical slaves. Sounds like a SF plot to me.

  9. I’m. with. you. on. this. John. I. don’t. want. to. be. read. to. by. a. computer.

    Just. like. you. don’t. want. me. to. talk. to. you. this. way.

    Audiobooks are art and human voices are beautiful. I will continue to pay for audiobooks. I can’t imagine ever listening to a computer read me a story. It would be awful.

  10. John, I agree entirely, but I go even further, as you might expect.

    *Even if* the Kindle were capable of perfect readings of textfiles, it *still* should be allowed. First, because Amazon doesn’t commit copyright infringement by making a tool that can commit copyright infringement. If there’s an infringement taking place on the Kindle, then the infringer is the *user*.

    Second, Amazon has no obligation to prevent its users from infringing, because tool makers are not required to take measures to prevent infringement from taking place using their devices. Kodak, Xerox, Microsoft and the Mozilla Foundation are all free to deliver products that can (and DO!) infringe copyright without having to worry about designing their products to minimize infringement (this principle of law originates with the 1984 Supreme Court ruling in the Betamax case).

    Finally, there is no way that Amazon *could* build a non-infringing TTS feature, because such a feature would have to be able to determine, a priori, whether a user was presently located in a jurisdiction in which a work was or was not in copyright — and more importantly, it would have to be able to adjudicate fair use claims (I’m sure that even if the AG claims reading the *whole* book is infringing that they would agree that a short fragment does not necessarily infringe).

    IMO, it’s silly to get bogged down in discussions of whether AI wizards will perfect the automated TTS in our lifetimes, because *even if they do*, the AG is *still* wrong.

  11. I have a Kindle 2 and I love it. The voice-to-text speech is better than I thought it would be, but is still lacking. For instance the chapter title is read and runs into the first sentence. Instead of Chapter 1 – The rising storm. (PAUSE) Jack scanned the horizon…,” Mr. Computer reads, “The rising storm Jack scanned the horizon” in one big run-on sentence.

    Reading on audiobooks is an art. We don’t pay $100 to watch robots do ballet. If someone wants to listen to an audiobook they aren’t going to do it via Kindle.

    I challenge someone to listen to an entire book on the Kindle 2. After you remove the screw driver from your tympanic membrane, let us all know how it went for you.

  12. Cory Doctorow:

    “but I go even further, as you might expect.”

    No! It can’t be! Not you!

    Heh. I love to be shocked, shocked.

    On this point, however, we agree —

    “*Even if* the Kindle were capable of perfect readings of textfiles, it *still* should be allowed.”

    — although for slightly different reasons, to wit: I don’t see such personal use as infringing in the first place.

  13. “I don’t see such personal use as infringing in the first place.”

    Oh, I agree! But even IF the Kindle is as good as a human and even IF that infringes, it’s STILL not Amazon’s problem!

  14. Hell, I’m someone who occasionally does buy multiple versions of the same book (text + audio). Not because of the convenience, but because of the performance. Steve Martin reading his short comedy pieces or his autobiography is amazing. And I’ll stand by the assertion that Lenny Henry reading Anansi Boys is actually a bit better than reading the book yourself.

    Need I mention Jim Dale’s work on the Harry Potter series?

    I can generally take or leave audio books, but I admit there are some authors that I will take the time to listen to because I know that I’ll be getting something more than convenience, which is the only thing the Kindle offers.

  15. Talking e-books? I’m not even sold on e-books yet, let alone talking ones. I think your point about the performance of the reading being key. I’m a big believer in the coolness of technology and all the wonderful things it can do, but I also try to be realistic about the limits both now and in the near future. We’re probably a few hundred years from AI being able to read a book out loud half as well as say James Earl Jones or Emma Thompson.

  16. My question is: Will the people who code the syntax to properly emulate dramatic reading (on a book-by-book basis) be compensated as programmers, or artists? If they get residuals, would anyone opt to go this route, rather than hiring Wil Wheaton to perform the same work in a shorter time frame with better results?

    In the end, the AG is being pre-emptive in defending against a future threat that is completely implausible. It’s like saying I want baby monitors banned because one day, they’ll evolve into nannybots that will make parents obsolete.

  17. Steve Burnap said “The truth is that people are only going to actually use Kindle’s Text-to-Speech when they have no other choice. This means that either an actual audiobook version doesn’t exist, or the person doesn’t have the cash to buy it.”

    NOPE! Just at a good part in the story and have to go get the kid.

    1. book to audio mode – check
    2. seat belt on – check
    3. parking break off – check
    4. happy dad driving and continuing with story – check

  18. The truth is that people are only going to actually use Kindle’s Text-to-Speech when they have no other choice. This means that either an actual audiobook version doesn’t exist, or the person doesn’t have the cash to buy it.

    Or the person using the Kindle is visually disabled. I have no idea how useful the Kindle’s interface is for the blind and/or visually impaired, so I can’t weigh in on that score, but the fact is there are all kinds of people using programs like Jaws to ‘view’ computer screens, and it’s not inconceivable that people might want to do the same with books. (And if you realize how laughably few books are transformed into audiobooks each year, even under the Library of Congress’s program, you might see how nice it would be to have the capability.)

  19. +1 What Cory said. Yet again I am saved from having to try and articulate an argument because Cory has done it for me.

    No derrivatve work fixed; no public performance == no breach of copyright; and even if there were, as Cory said, ‘it’s STILL not Amazon’s problem’…

    I can’t beleive that anyone is still arguing about it.

  20. Great points, but doesn’t text-to-speech hamper your ability to sell (or your agent’s ability to sell) audio rights elsewhere? Just as some paper publishers wibble that posting a story to one’s website counts as ‘previously published.’

    Slightly tangential, what’s your concern about Kindle preventing publishing rights from returning to the author. Bezos was on Charlie Rose last night. He talked about books never going out of print on Kindle. Great for readers, not so great for writers, I think.

  21. lynD:

    I generally license my audio rights when I license the book, so, no, it doesn’t present a problem. And as many know, I published Old Man’s War here before publishing it in paper form, so that’s not a problem, either.

    I’m not worried about eRights keeping books from reverting back to the author. Just specify in the contract that if less than “X” number of electronic copies are sold over “Y” period of time, the title reverts to the author.

  22. I don’t think there are too many people clamoring to perform the audiobook versions of college textbooks. Or even the most recent issue of Asimov’s. Even though there are people with reading disabilities or vision problems who might like there to be audio versions of those things and more, the market is just too small for it to be worth anyone’s time. I expect even the nonprofit organizations doing it with donations, government money, and volunteer hours have to draw a line somewhere.

    People can listen to e-books on their computer, so I don’t see why they shouldn’t be given the tools and opportunity to do so on a portable device.

  23. The truth is that people are only going to actually use Kindle’s Text-to-Speech when they have no other choice.

    I own a Kindle 1, and I think it’s a great reader device. I also listen to books on tape in my car during my commute.

    Being in the middle of half a dozen books is par for the course for me. Still, it would be great to be reading a novel, then continuing in that same novel as an audiobook in the car, and then back again in the evening. I’ve tried to go back and forth with the tapes and printed editions of the same books, and it doesn’t work very well.

    I agree that a professional performance in an audiobook can’t be beat. However, I can see that that side could be improved. If this could work seamlessly, I would definitely be interested in it.

  24. I’m with Cory on this. The crux for me is personal responsibility vs. manufacturers’ liability for how their tools are used; whether or not there is infringement (which, in the case of personal-use I believe there certainly is not) is not really the point.

    Sadly though, since we as a society seem to be moving further and further from taking personal responsibility, the argument “It’s their fault! They shouldn’t have sold this to me!” is appearing more and more often. Just because Cory’s examples have thus far *not* been blamed in that way doesn’t mean they *won’t* be.

    In plenty of other areas (from coffee to guns) precedents are being set that absolve individuals from responsibility for their actions, and I suggest that keeping quiet while *that* shit is going down is a Bad Thing.

  25. It’s like saying I want baby monitors banned because one day, they’ll evolve into nannybots that will make parents obsolete.

    Well, no. Nannybots don’t exist; audiobooks do. The concern is that text-to-speech will be used as an end run around actually paying for audiobook rights.

    The actual, present legal effects of the Kindle and whether it violates any IT laws are separate issues that I prefer to leave to guys like C.E. Pettit, who do this for a living; but the reason for the uproar among some authors isn’t really about theory, so much as a crystal-clear understanding of what happens on the business end of writing.

    I mean, dude, we’re lawyers. We get paid to take a mile if we’re given an inch, and to make sure the other side doesn’t.

  26. Cory’s arguments aside, Bruce Lewis points out that using a Kindle to generate a spoken version of a text for your personal use is not a copyright infringement at all, because it’s not fixed in a tangible medium (you’re not recording the voice that is played), it’s not a public performance, and it’s not a transmission (radio or webcast). You don’t even need to bring “fair use” into the equation. Using a Kindle to read an ebook out loud is not infringement, period.

  27. Uh, that would be “IP laws”. This is my best hooman in the morning before coffee imitashun and all.

  28. The statement you make about a performance versus a recitation is a good one. The appeal of many audiobooks, other than their format, is the fact that they are read by interesting people. The appeal of Neil Gaiman’s “A Study in Emerald” audiobook is that it’s read by Neil or the cast of people who worked on the audiobook of World War Z. An audiobook is purchased as much, I think, for the format as who is saying it.

    Additionally, is the AG going to try and tackle Microsoft? Microsoft has had a text-to-speech feature of their computer for decades.

  29. @mythago: I guess my point was that nannybots don’t exist, but neither does the ability to convert plain text into something even slightly approaching the experience of a professionally-made audiobook. What we have now is a pale imitation of the real thing, like suggesting using a pair of pliers to tune my car radio can replace the knob on a permanent basis. It may get the job done, but to suggest it’s going to eat into the audiobook market is absurd. If anything, authors should be up in arms because their works are being butchered by text-to-speech.

    I guess ideally you’d split the principal rights for each eBook into “text” and “text-to-speech” (maybe 80/20), and let the authors decide whether to allow the Kindle to talk for them. If not, the book is cheaper. It may seem mean to authors, but if you tack on an ADDITIONAL fee for TTS use, you’re taking away my pre-existing fundamental right to ruin your writing by having a computer read it.

  30. Sure. But from the legal POV, you don’t say “oh, those don’t really exist yet, we don’t have the technology so we’ll worry about it when nannybots are invented.”

  31. “I’m not at all convinced that realistic and engaging computerized audio will be possible at any point in the near or even middle future without requiring a clear and obvious derivative work to generate it. When it is possible, I suspect AI will be at a point where it will also be able to generate actual novels…”

    Yeah, that’s exactly the thought I had when you were talking about what goes into a performance. Likewise, I agree with every other point you make.

    What (else) I don’t understand is why the WG is having this hissy fit now. Adobe Reader has had text-to-speech capability for years. “Oh no, someone might buy an Adobe e-book and have their computer read it out loud to them!”—yeah, I don’t see any difference between that and the Kindle 2 situation, but I never heard the former.

  32. One thing that kinda bugs me is that a lot of people making this asinine argument seem to be ignoring the idea that Text-to-Speech is a vital tool for many folks with disabilities. It’s not just a toy of dubiously fun value. A lot of books don’t come with audiobooks–many textbooks, for example.

    (Are they going to get pissy about other TtS issues? I know the Mac comes with a basic TtS. You can read ebooks on the Mac. Oh noes! The sky is falling.)

    This is where the intellectual-rights folks tend to fall down in their arguments for me. They would argue that they should be able to control every aspect of every setting their content might possibly show up in, and hamstring utilities which have very valid uses.

  33. @28

    I guess ideally you’d split the principal rights for each eBook into “text” and “text-to-speech” (maybe 80/20), and let the authors decide whether to allow the Kindle to talk for them. If not, the book is cheaper. It may seem mean to authors, but if you tack on an ADDITIONAL fee for TTS use, you’re taking away my pre-existing fundamental right to ruin your writing by having a computer read it.

    No, and double no! Suggesting that authors should be able to licence TSS use and text use seperatly is not only wrong, it’s dangerous and an errosion of fair use. Regardless of what the author wants, I am not violating copyright to have a computer read their text to me – no derrivative work is fixed, and no public performance is made.

    The only reason that they _can_ sell audio book rights is because they are actually selling a licence to produce a derivate work! Just because they can see another potential revenue stream does not mean that the law should (or does) protect that steam – if it did, any new technology would be stiffled, and fair use non-existant.

  34. I think it’s worth hammering over and over again on the point that any putative worry about a future in which text-to-speech technology is a viable substitute for human reading is, bluntly and somewhat ironically, fantasy. Not science fiction. Fantasy. The astounding thing about computer voice technology in 2009 is how very very little it has improved since 1979: the absolute cutting-edge of the technology right now is at best only fractionally more convincingly human than the old Intellivoice.

    Inflected voice generation at the dead center of a Venn diagram of Really Hard Problems and Stuff We Basically Don’t Understand in computer and cognitive science. You can safely bet large amount of money against significant progress being seen in your great-grandchildren’s lifetimes.

  35. I don’t think Amazon is being particularly villainous (in this case), but the Guild has touched on a useful question: what is it that authors sell when they sell electronic rights, and how much is that worth? The pretty innocuous TTS issue illustrates just how unprecedentedly flexible the ebook medium is, and that’s likely to be an important consideration for many authors and publishers going forward.

  36. @ mythago: My argument is that the entire argument is nonsensical (legal foundation or no) because TTS-as-a-substitute-for-audiobooks is just as plausible as nannybots. We might as well demand legislation governing the acceptable use of photon torpedoes. The point at which TTS meets the standard for being a derivative work is nowhere in sight, which means (and I know this what you’re getting at) we’re spending a great deal of time arguing over an improbability that some lawyer dreamt up in the shower.

    Oh. New business line for law firms: junior associates arguing their client’s case in comments on blogs to further inflame a contentious issue, thereby drawing it out more.

  37. No, and double no! Suggesting that authors should be able to licence TSS use and text use seperatly is not only wrong, it’s dangerous and an errosion of fair use.

    I forgot to include “silly” tags around that bit, sorry. I wasn’t really suggesting that rights should be broken out. Not seriously, anyway.
    On the other hand, a lot of the world’s problems could be solved by telling the parties involved: “Sure, you can negotiate separately for X, but it’s not an additional right. You’re taking something away by breaking it out. So you can do it, but you’ll get paid less if you opt out. That’s your choice. Have fun with it.”
    Impossible to do, but it would make me smile to see it applied.

  38. Of course, the other way to look at it is “If you really can replace a human reader with an algorithm, why would you pay or charge extra for an audiobook?” If that technology ever existed, the book publishers would most likely fire the readers anyway and churn out an automatically generated audiobook.

    I just wish they’d hurry up and get everything into one unified package. I want to be able to read a book normally, then plug the same book into my car and listen to the audiobook from where I left off.

  39. WG and othe DRM types miss an important factor, irritation of the consumer –
    example 1)
    the problem w/ kindle2 is that it doesn’t have a flash ram slot (can’t easily import files) nor does it support PDF or other standard file types – when TOR.COM wants to give me a FREE e-book to get me hooked on a new (to me) author – I can’t view it on my kindle!

    example 2)
    once I buy a work – it’s mine – don’t try and limit the way I choose to enjoy it as long as its for personal use- if I buy an e-book – and want to hear it vsTtS or print it out and use it for wallpaper – its nobody’s business but mine (get off of my yard – grrrr!) – If I buy a music CD and want to listen to it on my IPOD this week and ZEN next week – don’t embed DRM code that screws with my computer (are you listening Sony?)

  40. Even if the Kindle DID perfectly inflected audio…someone has to code the inflection choices and dialogue. And the choices made by some geek (who we’ll assume shares the same language he’s ‘coding’ as his own native tongue) will be as good or adequate to the task.

    I’ve heard John Hodgman read his audio book. I’ve heard Phillip Pullman read His Dark Materials with a cast. I’ve heard Jim Dale do the Harry Potter books and Rob Ingles do the Lord of the Rings. Professional performers with highly-lauded talents. It seems the height of hubris to suggest that a Kindle…even one programmed extremely well with some super-audio software not yet developed, will match the skills of these presenters. Is it possible? Certainly. Is it likely or even actionable when much simpler alternatives exist? Most likely not.

  41. Persiaon@18: last I heard, Amazon’s reader has a touch screen. That’s not going to work for a blind user, TTS or no. So our only involvement in this is as kind of an interesting sideshow. Although when an ebook is $5, and the same audiobook is $24, I almost always purchase the ebook and let my computer read it. But the fact that I usually like TTS better than a human reader perhaps has more to do with the fact that I listen to TTS 8 hours a day, and so I find a new voice reading to me kind of distracting. The attention this is bringing to TTS technology is kind of nice, though. The more people who know about it, the more likely it is to get improved.

  42. I forgot to include “silly” tags around that bit, sorry. I wasn’t really suggesting that rights should be broken out. Not seriously, anyway.

    Yeah, I kind of guessed it was tongue-in-cheek; I just wanted to be absolutely clear about the fact that any attempt on their part to try and split out a separate TSS license is futile and legally meaningless. It’s not going to make one jot of difference to what I can do with the electronic version I buy.

    They could try and lock that functionality away behind DRM, but that does not change the fact I break no law if I did use TSS somehow without circumventing the DRM (camera + OCR or something). Which incidentally is why DRM is so bad – it has nothing to do with stopping infringement, and everything to do with stopping legal use to extort more money out of a paying customer by eroding their fair use rights.

  43. It all boils down to four separate and unrelated points, any one of which alone makes this action stupid:

    1) Is “Text-to-Speech” wrong, either legally or ethically? -> Certainly not.

    2) Is “Text-to-Speech” a threat to authors’ incomes? -> Almost Certainly not.

    3) Is “Text-to-Speech” a critical feature for the visually impaired? -> Certainly.

    4) Is there, pragmatically speaking, any chance of actually preventing people from using “Text-to-Speech” given expected computing trends? -> Almost Certainly not.

  44. Oh, one of my favorite subjects. First, I do not have a Kindle, but I want one. My only real concern is that there a still so many books not available on Kindle (Iain Banks books) so that is a bummer. Of course it is not available in audio either, but I digress.
    The text to Speech stuff is never in my lifetime going to match a good professional reader. Take for instance the guy who read’s Scalzi’s Old Man’s War stuff. Awesome reader!! He does difference voices for each of the characters and genuinely acts as he reads. I imagine the guy reads the book first to figure out the type of character (hero type, wimpy type, etc) before tackling the job. Text to Speech will never get that right just by doing a phonics job on the words.
    As for audio books, I love them. I spend an hour to an hour and a half commuting each day. Listening to a good book keeps me sane and from using my car as a weapon on some people. But the txt to speech part of Kindle just would not be the same.
    As a side note, John S. mentioned something about people not buying both audible audio books and print/kindle version. I am sure most do not however I for one do. Usually I do this for the larger books and would still do so with a Kindle. Reason being that some of them books out there are just huge. For instance Hamilton’s Pandora’s Star series I bought both audible and picked up a used paperback so that I could read on the weekends etc. Same goes for Reynold’s Revelation Space. As a matter of fact I am waiting for Redemption Ark to be released on Audio in march before plowing on. So, if they ever did the kindle/audible deal I would most likely jump for joy!

  45. #38 jasonmitchell – i checked the amazon site and it says it does support PDF. I imagine that is unprotected PDF so I am not sure if that works for you. One think I think they did take away is LIT support. I seem to recall that the first Kindle could use LIT format (unprotected of course).

  46. John, this is the best, most coherent commentary I’ve read on the topic so far.

    I’m also happy that Cory commented here, because they helped me see the real essence of his point, which essentially mirrors the anti-gun-control argument “Guns don’t kill people, people with guns do.” I accept that argument with respect to gun control, and I accept it with respect to Betamaxes and Kindles, too.

  47. John,

    your argument sounds remarkably similar to Eric Flint’s argument regarding ebooks vs paper and the benefits of giving stuff away for free. Indeed I’m willing to bet a significant sum of money that ‘text to speech’ will increase the number of audio books sold to a kindle customer.

    Here’s why. The quality of the reading when done by a human who has read and digested the book first and who is a good reader is going to sound enormously better than a machine version for a few decades to come (at least – possibly for ever). Someone who buys an ebook on the kindle may enable TTS for a chapter while driving instead of listening to Rush Limbaugh. He then realizes that damn he could actually listen to books during his comute every day. But after trying TTS for a few days the robotoic voice and lack of inflection means that he wants to pay for a real person reading it. So off he goes to audible and buys the recording and Amazon collects an additional $10.

    I note that Amazon actually has an incentive here to have relatively bad TTS technology.

  48. last I heard, Amazon’s reader has a touch screen. That’s not going to work for a blind user, TTS or no. So our only involvement in this is as kind of an interesting sideshow.

    I wasn’t sure. I do think it’s important for rights going forward though– I think a blind-friendly Kindle is far more likely than a speech reader that can automatically code inflection and emotion, for example! (It also might be possible to get it to work for people with visual impairments that make reading on a traditionally-sized screen difficult, if you can ‘resize’ the controls, but again, I don’t know enough about the Kindle to say.)

  49. #45 Jake: This was the point I was making (#24) as well – though I think it goes beyond the specifics of books or guns and touches on people’s deteriorating sense of personal responsibility.

  50. Gneh. For me, buying an Audiobook is an entirely different value-add. You’re also paying for a performance artist to lend their highly-skilled larynx.

    Even if TTS tech gets to the point where it can get all chokey-uppy like Wil did over his Sunken Treasure, it’s still just a hunk of silicon. I know it, and you know it.

    The whole thing’s a pile of crap, designed to get page-hits on their website. That’s my slightly-cynical take.

  51. FWIW, I’d say something around 2/3 of the audiobooks I get from audible are for something I’ve already read (and most likely own) in dead tree, or an electronic edition. Why? Because when I’m driving somewhere, in heavy traffic, and listening to an audiobook I like something that I don’t have to pay close attention to, to follow along.

    Audiobooks of stuff I haven’t read before? Those get enjoyed sitting in my recliner at home.

  52. Audiobooks are for when I’m in the gym, and boinging around too much to read text but also bored shitless and therefore need to keep my brain occupied to the utter boringness of the gym doesn’t send me home early.

  53. I have to admit that I am struggling to see what all the fuss is about what is the difference between a person reading a book to another and TTS.

    Last I noticed parents were not being rounded up for reading to their children nor were people who read book for elderly/ill/disabled friends/relatives or people who read out loud to themselves, so why go after TTS?

    The arguement that this is an attempt to protect against future technology doesn’t take into account the fact that it will take either be an AI of a comparable inteligence to a human or a highly specialised expert system to read an unadulterated ebook with the correct emphasis and passion.

    At that point maybe it would be the time to make these arguements, the law is a constantly evolving entity and often changes to reflect the new circumstances as attitudes change. Trying to guess at what the possible circumstances and technologies we will see in the future is not the best way to make good law.

    At least in the UK the Kindle’s TTS feature when used by people with a disability which affects their ability to read will be protected as a reasonable adjustment to allow people to use/access goods/services by virtue of the Disability Discrimination Act.

  54. MCM @35, was that last line a snark? I don’t practice this area of law and really have no dog in this fight, but if you wanted to accuse me of trolling or bad intent you should probably just out with it.

    “But it’s not as good as an audiobook” is different than “It’s not the kind of work that counts as an audiobook, legally speaking”. The technology to turn TTS into a replica of human speech isn’t there yet, particularly as to highly intonal languages like English. But it’s not “fantasy” to consider the implications when TTS and audiobooks exist, any more than space law is a fantasy because we don’t have colonies yet.

  55. mythago @54, definitely not a snark. I was thinking (as I was writing my own words) that I would do well in that position, playing devil’s advocate in comment threads for fun and profit. I definitely see your point, and I don’t think you can technically be trolling if you’re making sense :)

    (I am resorting to using smileys to help communicate my sincerity. Please assume I am not being sarcastic. I know I sound it, but I’m really not)

    As far as the entire argument goes, my thinking is now this: whether or not TTS will ever be capable of human-like delivery, the current sophistication of TTS has been around for at least a few years, and yet the AG has suddenly decided to push the panic button*. Rather than spending time and money to fight for something that is largely unimportant (and ultimately boils down to subjective “is TTS good enough” bickering, relevant or not), why not invest in proper audiobook promotion (as has been suggested elsewhere)?

    It’s not to say that SOMEONE can’t be thinking about this issue. But when you consider all the other things the AG could be tackling instead, it seems kind of silly to be putting resources into this question in any serious way. There may be space colonies some day, but do you really want to put your best minds on THAT subject when the world is crumbling all around us? Just because the law CAN go somewhere, doesn’t mean it needs to. At least not immediately. There is such a thing as being over-prepared way too early.

    (Speaking of which, another issue would be to try and define an undefined technology before it exists, and plan an IP strategy around it. TTS may become the biggest invention in the history of publishing by 2015, or it may be completely forgotten at the advent of some magical audio-enabled superbooks. If we create a box to grow the technology into, were putting inherent constraints on what people will imagine, and we might be shooting ourselves in the evolutionary foot)

    * The panic button is obviously being pushed because they smell money in the air with all the Kindle hype. But that doesn’t mean their membership (or the world at large) should play along. This is a wholly made-up catastrophe. If you close your eyes, everything WILL be okay, so it’s obviously not that big a deal.

  56. mythago: the vast majority of us own the technology to turn a book into an audio representation of a book. It’s called a “larynx”. I can take a book and perfectly legally read it out loud to another person. The only difference between that and “text-to-speech” software is the technology that does the conversion.

    Hell, if I wanted, I could go hire a voice actor to do the same. If I were a complete rich bastard, perhaps I could even hire Neil Gaiman to come over to my house and read “Graveyard Book” to me.

    The only way that “text-to-speech” is a copyright violation tool is if I take its output and sell it or give it away. But there again, I could do the same by reading the book with my eyes and speaking the results with my mouth. The technology has nothing to do with the copyright violation.

    This is the fundamental error here. Copying is not a copyright violation. It is, for example, perfectly legal to buy a book and copy it out long-hand, scan it in with a scanner, or whatever. Copyright only comes in when you try to then give it to another.

  57. Well, I can see at least one issue with TTS:

    “Quick! Somebody Kindle up The Eye of Argon! We’ll hide it behind the sofa at the next con and…”

    What? Am I the only one?

  58. I don’t get what the big deal is. If I buy a hard copy book, I could read it myself or I could have a passenger in my car read it to me. (If I could see and, um, drive.)

    Anyway, in that case, one could argue, two readers are utilizing the book and only one was purchased. I mean, why aren’t libraries copyright violators? (As you can tell, I’m not a lawyer, I’m just looking at this from an end-user pov.)

    So, kindle is a matter of access and mode. You can read it, or listen to it or a combination of both. Who cares? I get more the audible version of books because one could argue that the reader/performer is adding a layer of artistic value to it that should be licensed and compensated for.

    FWIW, as a blind person, I STILL prefer electronic voice to live voice. The MAC has a supposed humanistic voice (called Alex? I think?) and we just laugh at it because you can hear it breathe between sentences. It IS ANNOYING.

  59. Datum from our household: usually we only buy one version of a given work, but often make an exception for John Le Carre whenever there is an audiobook which Le Carre reads himself. John Le Carre is a remarkable voice talent — he does accents, he does different voices — and the timbre of his normal speaking voice is a delight to audit. So if there’s an author reads his own audio book, we buy it. But still, a codex is so much more portable, and doesn’t require special equipment, so we often buy the paper book too. But it will be a very long time before anything computer generated can compete with Mr. Le Carre’s readings, so your point about the non-excitingness of the Kindle’s new superpower still stands.

  60. steve @56: Right. And the vast majority of us own the technology to make a duplicate image of a book (called “eyes” and “memory”), therefore what’s the problem with making photocopies of a book I like for all my friends? Again, the issue is that traditionally audiobook rights are paid for separately. There is concern that TTS is going to end up being a way to get around paying authors for something that they formerly got paid for.

    Now, I have no opinion as to whether or not this is correct, and I believe Cory Doctorow is right that Betamax may well apply to the Kindle. But I do understand why there is a kerfluffle about this issue. “We’re not there yet” is very different from “We will never be there”, and way, way different than “We should not pre-emptively worry about the potential implications of this stuff when it gets better, particularly if we’re lawyers whose job is to make sure our clients get the best deal possible and preserve their rights.”

  61. I’m not sure how I feel about the copyright portion of this issue. Like people have said, we’re not there yet, but it’ll be extra cool if we do get there some day. And if we do, then authors may lose income.

    But I am deeply concerned with accessibility. As a visually-impaired person who will likely (if genetics have their say) lose his sight well before he dies, I’m excited about the Kindle’s potential. The combination of audible books and Amazon’s huge selection is extra sexy.

    Being the biased (and overly-entitled) person that I am, I lump TTS-based income losses together with wheelchair ramps and handicapped parking spaces. Someone had to pay for them, but it’s the price of living in a conscientious society. When organizations like the Authors’ Guild pit their debatable financial gain so squarely against my ability to access a book, I get very tectchy. Oh yes I do.

  62. I think the thing that will make text-to-speech better is to approach text-to-speech similarly to the way that Babelfish and Google translate translate languages from one to another. Don’t think of written english and spoken english as the same – think of written english, and the phonemes that need to be fed to the speech synthesizer as different languages. Build up a sufficient corpus – and the machine translation gets better. Due to the large corpus size this type of translation is more practical for Google, than Kindle – not super hard work, but lots of data to push around.

    Good machine translated text is good enough for news stories, but not good enough for novels and poetry. I think this will still be the same for improved speech synthesis for a long time – Still plenty of work for voice actors to make excellent Audio-books. Ananova may be a good example of my point – syntheis is good enough for news….

  63. The Kindle 5, with advanced AI Text to Speech, suffered a nervous fit when at a Science Fiction convention, someone uploaded a story known as “The Eye of Argon” and in the third sentence the Kindle 5 began laughing uncontrollably. Three people, who tried to turn the Kindle 5 off, accidentally looked at the text on the screen and required hospitalization themselves.

    Yeah, I can understand why Skynet is pissed and wants to waste humanity.

    Dr. Phil

  64. I listen to around 250 books per year while working around the farm. My choice is driven by both author and reader. There are some authors I have read completely, and there are some readers for whom I have done the same. I mourned the loss of Frank Muller’s voice no less than I mourn the loss of a favorite author.

    No synthesized reading will ever reach that status, no matter how technically good.

  65. steve @56: Right. And the vast majority of us own the technology to make a duplicate image of a book (called “eyes” and “memory”), therefore what’s the problem with making photocopies of a book I like for all my friends?

    There’s absolutely nothing wrong with making the photocopies.

    Giving those copies to your friends is, of course, a violation of copyright law.

    Again, the technology already exists for creating an audio copy of a book. It is called a voice. And laws already exist to protect authors from people using their voices to create audiobooks and sell or give them away.

    This is no different from demanding that books not be read allowed to children or blind people because it might cut into audiobook sales. This is not about “preserving” rights. It is a blatant attempt to gain new ones.

  66. This is not about “preserving” rights. It is a blatant attempt to gain new ones.

    That’s a nice sound bite but whahuh? New rights? The question is whether good TTS will be an end run around having to purchase audiobook rights. The fact that you can talk means no more to this question than the fact that you can see and remember has any relevance to photocopying.

  67. But will you get royalties in the future? Whether or not authors feel that Kindle’s current audio reading feature is a threat or just a promotional plus, Amazon grabbed a form of audio rights without getting authorization and makes money on the audio feature by it making the Kindle more appealing, and they don’t have to pay authors or publishers for these rights. So if they go forward with joint audio and electronic files, they may claim that their electronic print rights encompass electronic audio (as opposed to CD audio,) as evidenced by use in the Kindle, that they had these rights all the time, and so not give you any extra royalties or fees on actual audio recordings bundled with electronic print. You’re counting on Amazon to play fair, but as we’ve often seen, companies don’t always play fair.

    I bring this up because of what happened with electronic rights in the first place back in the early 1990’s. Book publishers traditionally got electronic rights as part of their license because they weren’t really exploited. With CD-Roms and the Internet, those rights became more valuable, but publishers claimed that since they had them in the past, they should continue to have them and pay authors the very small royalty attached to them, instead of paying more for those now valuable rights and giving a decent royalty. Authors, author groups and agents had to go to war with the publishers to both get publishers to pay up to authors on the money they were making off of electronic rights and to allow authors the option to keep electronic rights and sell them to electronic publishers directly instead of through publishers.

    The same thing occurred with audio rights — the publishers claimed they owned them as part of the book deal — and as part of the electronic rights of that book deal — and it was only concerted efforts that allowed authors to get additional money and decent royalties from publishers for audio, and the option to sell those audio rights directly to an audio company like Audible.

    Authors and publishers give books for the blind for free, sell books to libraries which lend them to thousands, give free books away on-line and in print as promotions. But those are authorized decisions. Amazon did not have authorization, and so now it’s a question of what rights Amazon has under the contracts it has made. Amazon isn’t going to be hurt by this clarification and revision of their contract, electronic sales for the Kindle are not going to be affected, and authors and publishers will benefit, now and in the long term. It doesn’t really matter what form the audio is in or will be in, or what the technology can do. It’s a matter of what Amazon contractually is allowed to do. If some authors want to give companies electronic audio rights and earn no money for it, they can, but authors should have the choice, not have it made for them.

  68. I am the author of four books and a former member of the Author’s Guild. I joined, as John would say, as an affectation. I figured it would be cool to get to hang with some of the writers who are members. Doesn’t happen. What SFWA has available for its members in terms of relating to others is significantly better.

    I dropped the membership a few years ago because there just didn’t seem to be a point.

    Apparently, there still isn’t.

  69. @WizardDru 39:Even if the Kindle DID perfectly inflected audio…someone has to code the inflection choices and dialogue. And the choices made by some geek (who we’ll assume shares the same language he’s ‘coding’ as his own native tongue) will be as good or adequate to the task.

    Umm, why would the choices be made by some geek? If you’re going to sit somebody down with a text and have them hand-inflect it, there’s no reason that task wouldn’t be done by a professional orator who happens to have a sore throat that day.

    That said, when somebody is making a changed version of the text, that is more suitable to an automated reader, there’s a derivation going on, there isn’t there? Didn’t Scalzi already cover this one?
    So that’s dealt with. But what if instead of trying to birth a book-reading AI you instead and somewhat more simply created markup related specifically to a work (say, a markup specifically meant to read Zoe’s Tale)? Well, then what you’ve got there is very definitely a derivative work, and you’ll hear from my lawyers.
    Right… there it is.

    FINALLY, why do you think that geeks are especially less suited to making these decisions than anybody else? Wil Wheaton comes to mind…

  70. You know, my reading skills seem to have fallen off of late. Disregard my previous comment @69.

    Instead you have this:
    Why would a programming geek choose their OWN inflection patterns over a professionals? The programming geeks involved were either:
    A) in a commercial environment (i.e. hired to make Kindle sound better) in which case the person hiring them has to say, “Here’s your model speaker,” and shove another person in the room with them.
    B) in an academic environment (i.e. interested in the problem for personal reasons) in which case they’ll put up a notice in the psych building asking for volunteers to read stuff to them, so they can do it right. (For a value of right which means “like a normal human being”).
    In case of (B), you’ll still fall below the reading quality of a professional orator, but at least as good as your mother tucking you into bed, if only she wasn’t tired when she did that…

  71. I’m not a lawyer, but it seems to me that if the Kindle’s text-to-speech is an infringement, then so is my making my own recordings of books I own, for my own use.

  72. As an audio producer (who has worked on a couple of mass-market audio books along with some radio drama) this *perfectly* articulates my feelings on the whole ridiculous matter. We’ll have an uncanny valley for speech long after we’ve filled on the one that currently exists in photorealistic CG.

    Thank you!

  73. So! Since Amazon caved and is now giving the publishers/authors the power to enable or disable TTS in their Kindle books, how does everyone feel about it?
    What are the odds that 99% of books AREN’T locked down this time next month? Or will Amazon willingly pay to enable the feature for its users? Or just offer two versions of each book: TTS or just-text? In which case, how much do you pay for a sub-standard audiobook? God knows publishers will see this as an opportunity to charge an extra $5 per copy.
    Coming at it the same question from a new reality, how does it feel to know Hollywood can choose if and when your Betamax can record?

  74. What saddens me about all this is that the position of the Authors Guild is being voiced by its current president, Roy Blount, Jr., who is as sharp and funny a writer as I can think of (and if you’ve ever heard him as a panelist on NPR’s “Wait, Wait, Don’t Tell Me,” you know how witty he is). To hear someone I greatly respect hold forth on a position so lame makes me realize how hard it is to be objective about issues that affect one’s profession.

    And I have to admit that I’m a little biased myself. As a Kindle owner, I want to see Amazon succeed in a big way, so bumps in the road like this one annoy me.

  75. So it’s better for Amazon to hold on to all the extra money it makes from having the text to speech function — selling Kindles to the visually impaired and those who like the feature — rather than share any of it with the authors or print publishers? I’m just trying to get a handle on the thinking of authors here, because it seems very short-sighted. Authors are happy to get some cash (granted not tons) for audio rights to their books these days, but they wouldn’t have any of that cash if authors groups like the Guild hadn’t fought tooth and nail to get publishers to pay up or release the rights back when there was practically no money in audio but it looked like there would be. They are happy to get royalties for Kindle editions of their books, but they wouldn’t have those royalties if the groups hadn’t fought to get publishers to pay up or release the electronic rights to authors before the Internet had barely gotten started.

    The actions of the Guild have not created a bump in the road. Amazon has backed down, claimed no wrong-doing, but acknowledges that rightsholders have the right to determine whether they grant Amazon the rights to electronic audio or not. So the authors/publishers who have no problem with the feature can grant it to Amazon and leave it on. The ones who feel it is a problem can turn it off and keep the rights. This doesn’t, of course, force Amazon to share their profits from this feature as yet, but it does give authors more options later on, and it means that other companies have to negotiate for the rights, not just grab them. It seems an okay resolution.

    Look, all the electronic stuff is tricky. But it’s better resolved by publishers (and in this case Amazon is a publisher) and authors and booksellers working it out together, rather than one company doing whatever it wants. If Kindle readers want electronic editions prices to go down, this is actually a way to do it.

  76. KatG:

    “So it’s better for Amazon to hold on to all the extra money it makes from having the text to speech function — selling Kindles to the visually impaired and those who like the feature — rather than share any of it with the authors or print publishers?”

    Amazon makes extra money from this? Doubtful; it’s not making any extra money when the Kindle user implements the text-to-speech technology on a book they already paid for. So this question, aside from being misleading in its presentation, is basically null. Any “extra” money Amazon makes from this is the almost-certainly small sliver of people who choose a Kindle over another eBook reader specifically because of the text-to-speech capability, and I doubt there are enough of those make up a pot worth bothering about.

    I don’t have a problem with the AG bitching and moaning about this, but Amazon’s correct in this one and AG’s not: a personal computerized recitation of a book one has already purchased is neither illegal nor wrong, nor is it at all likely to draw sales away from audiobooks. Since I find it highly doubtful the function takes any money from me or from other writers, I have a hard time getting worked up about it, or caring that the AG has.

  77. So! Since Amazon caved and is now giving the publishers/authors the power to enable or disable TTS in their Kindle books, how does everyone feel about it?

    Let down and pissed off; I expected better from Amazon.

    I am about to send an email to the Authors Guild letting them know that I will not be purchasing _any_ work, digital or otherwise from any author that blocks TSS on their electronic editions. I script to trawl the guild membership list and email the same thing each member may well be in order too.

    Unfortunately, I doubt it’s going to be the authors in control; it’s more likely that the publishers are going to make that decision, and it’s pretty easy to guess which way they will act. Ah well, my reading queue is too long as it is, maybe this will at least slow down it’s rate of growth.

    I’d boycott amazon too if I thought I could stick to it, but at the very least they will be getting a snotty email as well. Gutless.

  78. Given that the closest analogy to this is close captioning on TV (essentially just the reverse of “text-to-speech”) and given that is not only allowed but legally required in a number of contexts, I wonder if either the authors guild or Amazon are setting themselves up for a lawsuit from disabled groups.

  79. “To my mind there’s a substantial difference between a computer voice reading text which a consumer has already purchased, which to my mind is not a derivative work, and a computer voice reading audio under directions specific to a work, which certainly is.”

    This doesn’t make any sense to me. Doesn’t your argument include the idea than having a text you legally own read aloud by your computer is no different than having it read aloud by your best friend? And what if that friend practises a bit on his own and develops inflections specific to, say, A Christmas Carol, that he wouldn’t use when reading The Neverending Story.

    By your reasoning, Mr. Scalzi, doesn’t that make your best friend’s tailored, intelligently designed reading, a ‘derivative work’?

    I don’t think you can have it both ways. Inflectional dictionaries specific to a work are like a random internet user collaborating with a computer to read aloud to you a copyrighted work you have already paid for. Still perfectly fair use in my book! In any case, it’s dead stupid to quarrel with such people’s activities — they have already paid you and they are now advertising your work via audio *without* providing a digital copy (i.e. “reading”), thereby propagating interest without possession — a perfectly poised market. It’s insane to prevent this sort of thing in any way.

  80. I was thinking the same thing as Steve Burnap.

    Laroquod, when I read stories to my children, I’m providing a private performance of the work. (If I did a public podcast of my readings, that would be a public performance and therefore a violation of copyright.)

    On the other hand, if I publicly augment the original text with a detailed set of instructions on how to read this text, that’s a derivative work that includes, among other things, my own interpretation of how the work should be performed. That’s different in two ways from reading to my children: first, it’s public; second, it’s not a performance, but a set of instructions on how to perform the work, which couldn’t exist if the work didn’t exist, which makes it a derivative work.

    I don’t think John’s trying to have things both ways. I think we’re talking about two different things.

  81. BTW, I’m not saying that one should prevent people from providing the instruction set. It might be good for authors and publishers to allow this sort of collaboration. I’m just arguing that an inflection-set is a derivative work.

  82. Jake Freivald:

    “I don’t think John’s trying to have things both ways. I think we’re talking about two different things.”

    Correct.

  83. The extra money they make is by having the feature of text-to-speech, which is a sales feature that makes the Kindle more attractive to buyers, thus earning Amazon more sales. Which is why they have it in the first place. It’s a benefit that allows them to exploit the author’s works more fully. And if they have it, other readers coming down the pike are going to want to have it. Essentially, they improved the appeal of electronic print books by offering an electronic audio option with it. It matters not a whit whether it’s a boring computer voice or an actor doing inflections. Whether it infringes on the audio market or not is almost irrelevant. The bigger concern is that Amazon took rights they didn’t have to improve sales of their product, without purchasing the rights or getting authorization for them from the rights holders. They did a land grab.

    A reason to do this for Amazon is that it establishes contractual precedent. However electronic audio does morph, Amazon can claim that they were given the rights in the first place as part of their electronic print deal, so they don’t have to pay for them as they become more valuable as a sales option, just as publishers did with audio and electronic rights. And indeed, Amazon is doing just that. While they agreed now that it would be the rightsholders’ choice to have the electronic audio option or not, they claimed they had the rights already so it was not illegal. What was very clear was that they didn’t have the rights. They took them without consulting with their business partners — the publishers and authors.

    I’m not worried about the Kindle succeeding because it’s succeeded enough to encourage a lot of competition coming round the bend and to cement e-readers into the technology market. Putting some brakes on the electronic audio option is not going to change that. But that means a bigger market with more options that will give the author more earnings — just as electronic print and audio gives you more earnings now, earnings that publishers didn’t want to give you initially because the market was too “small.” They’re still small, though electronics may get very big, but they are a source of income. You said you licensed the audio rights to your work. That’s because authors fought publishers for your right to be able to do that. You might of been okay with publishers holding on to the rights as part of the package for print rights, but ain’t it nice to have the option?

    Authors can’t do concert tours with $300 tickets, and since authors are often making more money now from the sale of foreign and subsidiary rights than they are from domestic sales, Amazon’s little electronic print market is the next battleground. It’s not that they or publishers are the enemy. It’s just a matter of authors having a voice in what will happen to their work, even if the decision eventually becomes that a lot of stuff is free and open and anyone can use it. There are a lot of aspects of reproduction, copyright, etc., that can’t be controlled, and some like Doctorow argue shouldn’t be controlled, but this isn’t one of them. This is Amazon trying to sneak around their already established contracts with publishers and authors for electronic print. So they got rapped on the nose, and they complied. Happens all the time — they aren’t pissed off about it.

  84. I don’t think there is any ‘set of instructions’ distinction in copyright. A performance *is* a derivative work. Not all derivative works, however, are copyright infringements; if it is a personal performance then it’s fair use. Text-to-speech is a personal performance. Interpretive cues for the software are like director’s notes for the reader … they are process notes and not really a derivative work, actually. They are neither a performance nor do they contain any of the original work: therefore, they are simply not copyrightable. You might even be able to call them an analysis of the original work since they simply delineate what to emphasise and what not — much like a highlighter. A copy of a book with words or passages highlighted is not a derivative work, because what you’ve added to it is not actually copyrightable — much like the case of the interpretive cues.

  85. KatG:

    “The extra money they make is by having the feature of text-to-speech, which is a sales feature that makes the Kindle more attractive to buyers, thus earning Amazon more sales.”

    Bah. As noted previously, it’s only “extra money” if the feature is the thing that is the deciding factor to buy a Kindle over another type of eBook reader. I don’t suspect it is; I would suspect the Kindle’s book selection and its always on connection to the Internet (plus limited web capability), to give but two examples, are much larger draws. Basically, KatG, unless you can come up with some actual proof that the text-to-speech feature is actually in itself moving Kindles, what you have here is not an argument, it’s an assertion, which I’m not obliged to gave credence. You’ve made an assertion: Back it up.

    As regards the rest of your pro-AG polemics, that’s fine, but as I’ve noted before, in point of fact I think Amazon is correct here in its reasoning, not AG, and I’m not sure why I should be thanking AG for doing something I think is legally incorrect, either for the action itself or because you appear to think that credit accrues because in the past the AG correctly defended author’s rights. Just because they did useful things before doesn’t make this useful or correct.

    At best, I’m neutral about the AG’s actions, since I don’t think it harm sales of my work, or the work of others, though I don’t think it helps them any, either. On the other hand, it does create marginally more work for me to explicitly give Kindle owners permission to do with my work something I think they had a perfect right to do anyway, so AG doesn’t receive any thanks from me for that.

    And at worst, the AG is trying to carve away the public’s right to fair use, which is pretty dickheaded and not something I’m willing either to defend or celebrate. The argument that Amazon doesn’t mind abetting in this paring off of the public’s right to fair use doesn’t make it any better, and doesn’t cover Amazon in glory either.

  86. You seem to misunderstand what markup means.

    It doesn’t mean the computer would try to understand how each character speaks. Rather you’d have something like CSS for voice.

    You’d have a list of general definitions:

    Joe uses a male, baritone voice, pronounces each word slowly and methodically, with precise pronunciation.

    Clara uses a female, contralto voice, speaks 25% faster than normal and with a lisp.

    Then you’d place markup in the text, to indicate that this is spoken by Joe, and that by Clara, and add some more specific formatting, to indicate when they whisper, shout, or put a particular emphasis on a word.

    There would be instructions to indicate the lengths of pauses, speech volume, and pitch changes. Perhaps it would even be able to produce things that aren’t words, such as coughs and screams in the appropiate voice.

    There would be no discernment and no heuristics, the TTS engine would apply the precise pronunciation it was told to each word even if it sounds completely ridiculous.

    This well might put a voice actor out of a job, but replace them by a new profession, let’s call them “speech editors”, who understand the technical details of speech and intonation.

    I don’t think this is out of the possibilities of current technology, we simply haven’t needed something with this much precision yet, as most TTS applications are expected to deal with arbitrary text that wasn’t specifically intended to be read.

  87. Laroquodon:

    Fair enough with respect to the distinction between set of instructions vs. performance; with respect to copyright law, I may have been making a distinction without a difference.

    But you appear to be claiming that the “interpretive cues” aren’t instructions and therefore aren’t copyrightable. First, instructions definitely are copyrightable: think recipes, code snippets, automotive repair manuals, etc. Second, the “cues” are definitely instructions, because instructions are what computers execute. If they weren’t instructions, the computer wouldn’t do anything with them. So why wouldn’t the interpretive cues be copyrightable?

    (You mention director’s notes, saying they’re not copyrightable, but DVDs have directors’ commentaries, and you can’t legally rip them and post them to YouTube.)

    The real question here is, do instructions on how to read a work violate the copyright of that work?

    A follow-up: Would it matter if the markup were rendered in a way that contained none of the original work (e.g., “Words 212-214: interrogative”)? That’s perhaps the most interesting question your post has raised.

    FWIW, I don’t think your highlighter analogy sheds any light on the matter. Selling a book doesn’t violate copyright, so selling a highlighted book doesn’t, either, even if the highlighted book is a derivative work. (And there are clearly situations in which a highlighted book could be a derivative work.) That said, this raises a question: if you created, say, an onionskin overlay that highlighted specific words in a book, but didn’t contain any words from that book, would it violate the copyright of the book?

    For that matter, do the Cliff Notes people have to pay copyright holders when they create Cliff Notes for a work? Knowing that would shed some light on the markup issue.

    On a completely different note: VT, you seem to have a very narrow view of what could be done to mark up an eBook.

  88. My take on the issue is that the “TTS Quality” argument plays right into the AG’s hands. Their response will almost certainly be that a violation of rights is always a violation even if it’s done badly. A stack of lousy photocopies of a book is still a violation of the author’s copyright, even if almost nobody would prefer it to a nice crisply printed trade paperback. Indeed, they could argue that the sheer crappiness of the representation unfairly diminishes perception of the author’s work.

    I think, rather, that your response to Cory Doctorow is the better way to go about it: that such a personal use as this is simply not infringement. Getting into the argument over TTS quality is a way of ceding this point to the AG, and I think it’s a losing proposition. That’s probably why Blount brings it up in the first place, as a blind.

    To Laroquad, in #82 and #87: I think the closest analogy would be hiring someone to read you a book, as elderly or ill people have been known to do. To the best of my knowledge, this venerable practice is still accepted as legal.

  89. Interpretive cues do not have to be stored like markup, in fact exactly for these kinds of legal reasons, they probably will not be or if they are there will a non-markup alternative very soon. Instead, they’ll just be stored as character references. e.g…

    INTERPRETIVE CUES FOR MOBY DICK

    Line 1: Pause before character 9.
    Line 2: Emphasise characters 9-15.

    Result: Call me *Ishmael*.

    There is no way that you can call that a derivative work, PERIOD. It does not contain a single character, period, or quotation mark from the original work. It may require access to the original work to decode, but SO DOES A CRITICAL REVIEW which is fair use.

    It may be intended to aid a performance of the original work, but so is a personal reader’s experience in reading that work, which is fair use.

    If I write you a ‘set of instructions’ for how to copy and distribute an eBook without pay, I may have violated the *DMCA* if I broke any encryption to do so, but I have not violated *copyright itelf by transmitting *those instructions* even though they are instructions that could be used to violate copyright. Because instructions for breaking copyright are not a derivative work, nor are they an unauthorised copy, even though they are intended to create an unauthorised copy. Therefore, if you have a set of tips listed as above for how to read a work aloud, whether meant for a computer or a human, it’s irrelevant: those tips in and of themselves simply *are not* a derivative work.

    Basically you people are mostly getting tangled up and confused in wholly irrelevant distinctions that really don’t have anything to do with copyright.

    Machine-performed vs. human? Irrelevant. Whether it’s written in human language or some markup language? Irrelevant. Whether it’s as good as a human reader or not? Irrelevant.

    A couple of you got it right: public vs. private, public vs. private, public vs. private. This is the only distinction that has been mentioned (besides that between written and performed works) that I have ever heard of as coming into play in real, live, in-the-courtroom, copyright law.

    On that score, the Author Guild’s case doesn’t exist. The performance simply has not been performed until the computer speaks it, and the computer has no idea even what to perform until you personally give it the original work and a cue sheet. Therefore, until that moment, no performance has occurred.

    The idea that a ‘public performance’ has occurred here is ludicrous and should never pass the common sense smell-test.

  90. At the risk of getting droney, let add a final argument which if I had thought of it earlier I probably could have dispensed with a lot of my most recent post:

    Imagine I am a personal reader. I read to people. It’s what I do. A lot of people want Stephen King’s The Stand read to them; so I read that a lot. I got so much business that I started a business of personal readers. Some of them were sub-par, so I write up a little guide to personal readers reading The Stand (among other works)… a guide that refers by page and paragraph to important words that should be emphasised, and why.

    You people who believe that publishing interpretive instructions is copyright infringement: are you saying that I could not publish my guide to performing The Stand without permission from Stephen King?

    Are you saying that the art of directing (which is essentially what we are talking about here should be publically muzzled to protect the art of writing?

    You obviously shouldn’t be saying that, and substituting the end personal reader with a machine instead of a human doesn’t change a thing.

  91. @Jake Frievald: sorry I read your response after I typed up all the rest — you have some good counterpoints and you’re right those analogies weren’t perfect. Perhaps instead of ‘not copyrightable’ I should have kept to ‘not an infringement’. However, my new arguments headed in a slightly different direction and don’t really require that you conclude anything is not copyrightable.

  92. Basically, if the Author’s Guild succeeds in defining the ‘public’ transmission of a set of interpretive cues as copyright infringements, what’s going to happen to ‘public’ education? Will a teacher even be able to legally stand in a room teach her students *how* to read a poem? Can he distribute a handout to accompany the poem with instructions on which lines to emphasise, etc.? Why doesn’t she need to apply for the same ‘interpretive rights’, above and beyond owning the text, that somebody publishing the *exact same information* over the net to be read by a machine will?

  93. Dude, is there a reason you couldn’t have put all of that into a single comment? I really dislike multiple sequential comments from the same person. Catch up on the whole thread before commenting.

  94. @Laroquod #93

    “There is no way that you can call that a derivative work, PERIOD. It does not contain a single character, period, or quotation mark from the original work.”

    That is almost hilariously off-base.

    Are you aware that Mickey Mouse’s image isn’t actually, itself copywritten? Steamboat Willy is, and Mickey Mouse’s image is protected because using him is the creation of a derivative work* of Steamboat Willy. Derivation doesn’t require copying. Not at all. Not in theory. Not in practice. Not in the law. (keep in mind, that these days, Mickey Mouse doesn’t look like the (as far as I know, unnamed) Mickey Mouse that was in Steamboat Willy).

    Incidentally, the whole series of questions about “what if you try to edge around being declared a derivative work with this sneaky trick or that sneaky trick” are also off-base. American law** doesn’t recognize sneaky tricks. American law is very VERY results based. Laws are enforced primarily on their intended effect (as interpreted by judges) not on their explicit wording (though, obviously, wording guides judges’ beliefs about intended effect).

    *the creation of certain derivative works is not banned through copyright, under fair use

    **other country’s judicial systems are far more literal, e.g. Japan, where gambling is technically illegal, but almost every single video-arcade has slot machines, and video-poker, and pachinko, because they formally separate the game-playing system from the redemption to cash system (that sort of thing doesn’t hold water in the U.S.A.)

  95. When the AI is ready for proper read back I intend to make Jerry Lewis or Lucille Ball read Old Man’s War to me. That’s not wrong, right?

  96. I’m not making the assertion that the text-to-speech is the deciding factor in buying the Kindle for many people, so I’m not backing up an assertion I’m not making. :) I’m making the assertion that the text-to-speech function helps to sell the Kindle and to make the Kindle slightly more attractive and more competitive than its rivals, which is why they put it into the machine. And that this is going to be a factor in the future, one that may effect more subsidiary rights than just audio. Which is why Neil Gaiman’s agent is arguing with Neil Gaiman about it. This was a contracts dispute, and the past contract disputes over rights use for audio and electronics are relevant. You may feel that this one was an unnecessary dispute, that Amazon did fair use. For me, I’m less sanguine about it. That’s probably my literary agent past coming back to haunt me, but as the electronic rights market grows, these disputes are going to be more common as the legal rules and business practices get worked out.

  97. I’m struggling with this. Saying that without authorization its illegal for a human to read someones book for profit but its legal for a computer to read the book for profit (i.e. the computer-maker) seems silly just because the computer can’t do as good of a job. It isn’t legal for computers to print out copies of artwork to be sold just because they don’t do as good of a job at the reproduction as a talented painter can.

    On the other hand, I really hate the nickle-and-dime concept, that each and every possible way something can be used has to be individually licensed. I’ve heard the movie industry is working on ways to make you pay for the ability to fast-forward. You buy the book, you should be able to read it how you like, or have it read to you.

    I don’t think (but am not sure) its illegal to hire someone to read the book to you, so why should be be illegal to buy some program to read the book to you.

    On the other hand, I really do think this will cut into audio-book sales. You laugh at the ability of the computer synthesis speech but I laughed at the idea of people watching movies on 2″ screens, people will do it if its cheap and free is about as cheap as it gets.

  98. Thanks, Scott (99), that’s the kind of question I had. Do you have knowledge of this through your occupation, or is there someplace you can recommend I go to read up on this specific issue?

    TV (100), that is SO wrong. But not everything that’s wrong is illegal.

    KatG (101), you say this: I’m not making the assertion that the text-to-speech is the deciding factor in buying the Kindle for many people… and then say this: I’m making the assertion that the text-to-speech function helps to sell the Kindle and to make the Kindle slightly more attractive and more competitive than its rivals…

    Dissect that for a moment. If the TTS function is not the deciding factor in buying a Kindle for at least some people, then adding it would not benefit Kindle sales. However, you assert that the TTS function makes the Kindle slightly more attractive and competitive; for that to be true, the TTS function must be the deciding factor for some people. N people would buy Kindle 2 without the TTS feature, while N+M people would buy it with the TTS feature. So you are, indirectly at least, asserting that TTS is the deciding factor for some Kindle sales.

    M, to John’s point, is probably *tiny*. In which case, so what?

    Mark (102): Saying that without authorization its illegal for a human to read someones book for profit but its legal for a computer to read the book for profit (i.e. the computer-maker) seems silly just because the computer can’t do as good of a job.

    It’s legal for a human to read a book for profit — it just has to be a private performance. (Think nannies reading to children.) It’s legal (or should be) for a computer to read the book for profit — in a private performance.

    If you made a podcast of the Kindle reading Old Man’s War, that would violate copyright. It would still be a crappy recording, but its crappiness has no bearing on the copyright violation.

    The Kindle’s crappiness is a pragmatic issue. It won’t significantly eat into audiobook sales — wouldn’t even if illegal Kindle-podcasts abounded — because audiobooks are such better performances of the text. I think that was John’s point. (He’ll thwack me with his clue-stick if I’m wrong.)

  99. Actually I believe the fact that it won’t do inflections is the strength of it, not a weakness. I’d say 50% of audiobooks I’ve heard are awful, awful awful. I’d rather have no inflections or “acting” than have it really bad.

    But as a blind person wrote in the other thread, or as I wrote about reading with an actor, you want to do it with no inflections for the benefit of the listener.

    There of course will always be a place for books read by actors. But what I think it would replace is audiobooks for textbooks, and weed out all the crap audiobooks made just to have something to listen to in your car, and what might happen is it creates a new genre of audiobook made very well and for entertainment.

  100. There’s a difference between a bonus feature that enhances marketability and sales potential and a deciding sales point for all comers, but I’ll drop the argument, lol. Booksquare is making an interesting point that Amazon may have actually won this dispute because now it has further legal argument for keeping the DRM that locks Kindle buyers into only buying from Amazon, and she worries that the Authors Guild will then campaign to keep DRM in place. I think this would be a really bad idea. Kindle’s DRM feature is what makes me have little interest in getting one (even if I could afford it.)

  101. The computer programmer who adds the markup to Zoe’s Tale should be paired with an experienced reader so that the performance sounds authentic. If this does not happen, it is management’s fault, and we should not let them pass that buck.

    -A Computer Programmer
    (One of the few who could probably make the correct markup unaided.)

  102. Bottom line: I want a choice. I buy books and I buy audio books. Sometimes I want to have an ebook read to me. Had I known Amazon was going to do this, I would not have purchased a Fire and given my Kindle 2 away.

%d bloggers like this: