Thursday, September 15, 2011

FYI France : Project Gutenberg and the passing of Michael Hart

September saw the death of one of the unlikely triumvirate who have done perhaps the most to envisage and engineer our transition from print to digital text: Michael S. Hart, doyen, founder of Project Gutenberg which now operates at 36,000+ free-etexts-strong -- disparu September 6 at age 64.

As for the other two -- the creative corporate maelstrom involving Larry Page & Sergey Brin & two sisters named Wojcicki & the Stanford Digital Library Technologies Project & Marissa Mayer's stopwatch, & called "Google" or the "Books" part of that anyway --

http://books.google.com/intl/en/googlebooks/history.html

-- and Amazon.com's amazing Jeff Bezos -- well, both of those other two still are very much alive and thriving and busy building their respective World's Largest Libraries Of Digital Texts So Far. And I earnestly hope Project Gutenberg will persevere now, too, in spite of its loss -- along with the massive efforts of HathiTrust, and Gallica, and the millions of other digital text repositories all blossoming now... sprouting on every institutional hard drive, and soon surging forth on or via every individual's mobile...

Michael personally, though, will be missed. His refreshing naïveté -- every effort needs one pioneer, at least, who believes fervently in a world without money -- Hart's particular vision balanced the more worldly realisms of the other two undertakings well. Every pioneering effort needs a variety of flavors, and online digital text has benefited greatly from his.

What follows below here, then, is an excellent and interesting eulogy of Michael by Hervé Le Crosnier, translated by me from the latter's elegant French which also is included -- online digital text has been a trans-national and cross-cultural effort, as Hervé's observations here and his own long involvement in all of this both attest --

[* version en américain, see version in french below]

"Project Gutenberg is Orphaned: the death of Michael Hart"

by Hervé Le Crosnier, translation by Jack Kessler

Michael Hart died on September 6 at the age of 64. He will hold his place in the history of digital culture as the founder of Project Gutenberg, a major cooperative undertaking dating from the beginnings of the Internet, which created a gigantic database of digitized books made available to users on a shared basis.

It was forty years ago, in July 1971, that the young Michael Hart received his pass for shared-time use of the Xerox computer at the University of Illinois Champaign-Urbana campus. With just a normal math background, Hart wondered what he might usefully accomplish using such a tool, one at that point limited to a single all-capitals character set and very slow by comparison to today's computers.

He used his time to make a copy of the US Declaration of Independence, all the while dreaming the dreams of universal bibliography launched by the founding-fathers of digital information such as Vannevar Bush, Joseph Licklider and Ted Nelson. The file which resulted was only 5kb -- but he was forced to abandon his original idea of sending the text to the 100 or so users then possessing an Arpanet address, as that would have choked that entire network.

So he stored it on a server and made it available for free downloading -- with no hypertext links, forty years ago those hadn't been invented yet. Even though only six users took him up on his offer, the first electronic book of the digital information era had seen the light of day. This was in any event the most expensive digital book in history -- Hart calculated his computer-time for its production and estimated the cost of that at one million dollars.

Hart continued with his effort to make the greatest possible number of digital books available. Even though the first texts were difficult to read, lacking typography, showing only capitals, with no page formatting... He never strayed from his initial aim of making the works available to all. To this end he drew upon an essential characteristic of digital text: reproduction and distribution on the networks costs nearly nothing, and it costs even less and less as the machines and communications channels grow better at what they do.

As Hart wrote last July, "One thing about eBooks that most people haven't thought much is that eBooks are the very first thing that we're all able to have as much as we want other than air". And he anticipated future uses beyond just reading, such as textual analysis, word-comparison, fulltext search, and linguistic and stylistic studies all computer-assisted.

For a long time Hart's credo was "plain-vanilla ASCII": he avoided all page-formatting so that texts might be accessible to all machines, and all users. This led Project Gutenberg volunteers to adopt a strange system of accents, placing those next to the letters they affected. Hart's nonconformity regarding HTML, however, disappeared when the Web became the principal means of distributing digital texts: the texts follow now-accepted standards via their markup -- particularly the use of UTF-8, the character set standard which enabled composition in most world languages.

As with his project, and his vision, Hart was generous and inspiring; he possessed a grand sense of conviction, a sound grasp of the organization needed for his radical project -- he knew how to assemble millions of volunteers to accompany him on his adventure of digitizing the knowledge in the world's books. These were volunteers who began by hand-typing their texts, then by scanning and using optical character recognition, always pressed to make a careful transcription.



We often stand amazed before the giant industrialized projects of digitization. But let us reflect on the capacities offered by a coordinated mobilization of millions of volunteers. The construction of an open commons where all can share is a dream of many -- a project on which each can participate, each at their own level, in the construction of something greater than them all.

In the magazine Searcher in 2002, Michael Hart described this situation as a true change of paradigm: "It's the power of one person, alone in their basement, being able to type in their favorite books and give it to millions or billions of people. It just wasn't even remotely possible before...".

The personal will-power of Michael Hart enabled him to pursue his great project throughout his life. Even though it was 1994 before the hundredth text was available -- the Complete Works of Shakespeare -- just three years later the Divine Comedy of Dante became the thousandth. Project Gutenberg, with its 37,000 books in 60 languages, is now one of the principal sources of free digital books accessible in current formats -- epub, mobi... -- for readers, tablets, smartphones, and of course for the Web.



The assembled and reformatted texts are distributed free-of-charge and for any use. The free charge is only one aspect of the book access provided by Project Gutenberg: the texts also may be transmitted, re-edited, re-formatted using different tools, used in teaching or in other activities... This is the "public domain" in its full sense: not just a guarantee of "access" -- it is even more, a full "use".

Which is also the best way to protect "free" access: among the re-uses, even if some of those are commercial, in the sense of involving some supplementary added-value, there also is at least the use which relies simply upon the free distribution.

That is a lesson to consider, for all the institutions which now are charged with the public distribution of works which are in the public domain. Digitization should not add additional barriers to the use of the text, certainly not commercial barriers... these are efforts which often offer an improved "rehabilitation" of classical or forgotten works...

At a time when the British Library is signing a contract with Google limiting certain uses of a data file thus-obtained -- or the Bibliothèque nationale de France is adding a mention of "propriété" to digital works in the public domain distributed via Gallica... -- some notice, of the line on all this taken by Michael Hart, would be a good idea.



Michael Hart's strong character, his work-ethic and his capacity for mobilizing volunteers around his efforts, remain in our memory. The journals which have announced his death justifiably write of the "creator of the first electronic book".

That however is too simple. It is above all Michael Hart who placed the book at the heart of the information-sharing model for the Internet. His is the clear conscience which insisted upon the protection of the public domain -- protection against the creation of new protected enclaves, by technical means or commercial contract -- in the creation of his Project Gutenberg. Michael Hart never ceased defending a vision of the book as an organizer of exchange, of knowledge and of emotions, among individuals -- he mobilized volunteers for this effort, the building of an information network for all who love to read and to share what they read.



Hervé Le Crosnier
Caen, September 10, 2011
Text distributed under a Creative Commons license

http://blog.mondediplo.net/2011-09-11-Le-projet-Gutenberg-est-orphelin (original posting, in french)


--oOo--


[* version en français]

Le projet Gutenberg est orphelin : décès de Michael Hart

Michael Hart est décédé le 6 septembre, à l'âge de 64 ans. Il restera dans l'histoire de la culture numérique comme le fondateur du « projet Gutenberg », un projet coopératif majeur datant des débuts de l'internet et ayant réussi à créer un gigantesque fonds de livres numérisés offerts en partage.

Il y a quarante ans, en juillet 1971, le jeune Michael Hart reçoit son sésame pour utiliser, en temps partagé, l'ordinateur Xerox de l'Université d'Illinois à Urbana-Champain. Peu versé sur le calcul, il se demande ce qu'il pourrait bien faire d'utile à la société à partir d'un tel outil, limité, n'utilisant qu'un jeu de caractères en capitales, et très lent en regard des ordinateurs d'aujourd'hui.

Il utilisera son temps pour recopier la « Déclaration d'Indépendance » des États-Unis, en songeant aux idées de bibliothèques universelles lancées par les « pères fondateurs » de l'informatique, notamment Vannevar Bush, Joseph Licklider ou Ted Nelson. Le fichier pesait seulement 5 kilo-octets, mais il du renoncer à sa première idée d'envoyer le texte à la centaine d'usagers ayant une adresse sur Arpanet, car cela aurait bloqué tout le réseau.

Il le mit donc en dépôt sur un serveur pour un libre téléchargement (sans lien hypertexte, une notion qui n'existait pas il y a quarante ans). Même s'ils ne furent que six à profiter de l'offre, on considère que le premier « livre électronique » du réseau informatique avait vu le jour. Ce fut d'ailleurs le livre numérique le plus cher de l'histoire, Michael Hart ayant un jour calculé une valeur approximative de son accès à l'ordinateur et l'évaluant à 1 million de dollars.

Michael Hart a continué sur sa lancée pour rendre disponible la plus grande quantité de livres possible. Même si les premiers textes étaient difficilement lisibles, sans typographie, en lettres capitales, sans mise en page,... il n'a jamais dévié de sa volonté de rendre les œuvres disponibles à tous. Pour cela, il s'appuyait sur une caractéristique essentielle du document numérique : la reproduction et la diffusion via le réseau ne coûte presque rien, et même de moins en moins quand les machines et les tuyaux deviennent plus performants. Comme il l'écrivait encore en juillet dernier, « à part l'air que nous respirons, les livres numériques sont la seule chose dont nous pouvons disposer à volonté ».

Et il anticipait sur les usages à venir au delà de la lecture, comme l'analyse du texte, la comparaison de mots, la recherche par le contenu, l'établissement de correspondances ou les études linguistiques ou stylistiques assistées par l'ordinateur.

Longtemps son credo fut celui du « plain vanilla ascii », c'est à dire de refuser toute mise en page afin que les textes soient accessibles à toutes les machines, par tous les utilisateurs. Ceci conduisait les volontaires du projet Gutenberg à un codage particulier des accents, placés à côté de la lettre concernée. Mais sa méfiance devant HTML a disparu quand le web est devenu le principal outil de diffusion des écrits numériques : l'universalité passait dorénavant par le balisage, et l'utilisation de UTF-8, la norme de caractères qui permet d'écrire dans la plus grande partie des langues du monde.

Comme son projet, disons même sa vision, était généreuse et mobilisatrice ; comme il possédait un grand sens de la conviction et de l'organisation et proposait un discours radical, il a su regrouper des millions de volontaires pour l'accompagner dans sa tentative de numériser le savoir des livres. Des volontaires qui ont commencé par dactylographier les textes, puis utiliser scanner et reconnaissance de caractères, mais toujours incités à une relecture minutieuse.

On est souvent de nos jours ébahi devant les projets industriels de numérisation. Nous devrions plutôt réfléchir à la capacité offerte par la mobilisation coordonnée de millions de volontaires. Construire des communs ouverts au partage pour tous répond aux désirs de nombreuses personnes, qui peuvent participer, chacune à leur niveau, à la construction d'un ensemble qui les dépasse.

Dans le magazine Searcher en 2002, Michael Hart considérait cette situation comme un véritable changement de paradigme : « il est dorénavant possible à une personne isolée dans son appartement de rendre disponible son livre favori à des millions d'autres. C'était tout simplement inimaginable auparavant ».

La volonté de Michael Hart lui a permis de poursuivre son grand œuvre tout au long de sa vie. S'il fallut attendre 1994 pour que le centième texte soit disponible (les œuvres complètes de Shakespeare), trois ans plus tard la Divine Comédie de Dante fut le millième.

Le projet Gutenberg, avec ses 37000 livres en 60 langues, est aujourd'hui une des sources principales de livres numériques gratuits diffusés sous les formats actuels (epub, mobi,...) pour les liseuses, les tablettes, les ordiphones, et bien évidemment le web. Les textes rassemblés et relus sont mis à disposition librement pour tout usage. La gratuité n'est alors qu'un des aspects de l'accès aux livres du projet Gutenberg : ils peuvent aussi être transmis, ré-édités, reformatés pour de nouveaux outils, utilisés dans l'enseignement ou en activités diverses...

Le « domaine public » prend alors tout son sens : il ne s'agit pas de simplement garantir « l'accès », mais plus largement la ré-utilisation. Ce qui est aussi la meilleure façon de protéger l'accès « gratuit » : parmi les ré-utilisations, même si certaines sont commerciales parce qu'elles apportent une valeur ajoutée supplémentaire, il y en aura toujours au moins une qui visera à la simple diffusion. Une leçon à méditer pour toutes les institutions qui sont aujourd'hui en charge de rendre disponible auprès du public les œuvres du domaine public.

La numérisation ne doit pas ajouter des barrières supplémentaires sur le texte pour tous les usages, y compris commerciaux... qui souvent offrent une meilleur « réhabilitation » d'œ'œuvres classiques ou oubliées. Au moment où la British Library vient de signer un accord avec Google limitant certains usages des fichiers ainsi obtenus, où la Bibliothèque nationale de France ajoute une mention de « propriété » sur les œuvres numérisées à partir du domaine public et diffusées par Gallica... un tel rappel, qui fut la ligne de conduite permanente de Michael Hart, reste d'actualité.

Le caractère bien trempé de Michael Hart, sa puissance de travail et sa capacité à mobiliser des volontaires autour de lui restera dans notre souvenir. Les journaux qui ont annoncé son décès parlent à juste titre de « créateur du premier livre électronique ». C'est cependant réducteur. Il est surtout celui qui a remis le livre au c.ur du modèle de partage du réseau internet. C'est la pleine conscience qu'il fallait protéger le domaine public de la création des nouvelles enclosures par la technique ou par les contrats commerciaux qui a animé la création du Projet Gutenberg. Michael Hart n'a cessé de défendre une vision du livre comme organisateur des échanges de savoirs et des émotions entre des individus, mobilisant pour cela des volontaires, le réseau de tout ceux qui aiment lire ou faire partager la lecture.

Caen, le 10 septembre 2011
Hervé Le Crosnier

Texte diffusé sous licence Creative Commons

http://blog.mondediplo.net/2011-09-11-Le-projet-Gutenberg-est-orphelin

--oOo--

And the following is a personal note about Michael which I myself sent out -- somewhat in shock, I am getting perilously-close to age 64 now too -- to the excellent Exlibris list of which I have been a member since its beginnings, where another member just had told the rest of us the sad news --

Re: [EXLIBRIS-L] Michael S. Hart, founder of Project Gutenberg has died
Date: September 8, 2011 11:49 AM
From: Jack Kessler
To: [EXLIBRIS-L] Rare book and manuscripts



"Michael prided himself on being unreasonable..."

I must say, to all who knew him, that Michael Hart was among the most cordial of the various strong-minded people whom I have met and with whom I have corresponded, online.

He and I discussed various topics, over the years. Michael always was receptive to, even curious about, another person's ideas. For a crusader this is rare, or for any unreasonable nonconformist, if that is what he was -- it accounts perhaps for his great success at attracting other people to his projects and getting good work out of them. Michael was really good with some people.

He certainly was with me. His Project Gutenberg was an early inspiration -- one of the earliest -- in a world which largely, at that time, never even had heard of "digital texts".

That we all take the phenomenon so much for granted, now, is due too often to our own blindness; that the new medium might offer a little high-quality content -- might tell good stories, preserve and perpetuate valuable memories, and do so inexpensively or even *gasp* "for free" -- is due greatly to the foresight and efforts of a very small set of early digital-texts pioneers, and Michael S. Hart definitely was one of those.

Jack...

--oOo--

And, finally, a Codger-Note:

Those of us who love books -- not instead-of but as-well-as and in-addition-to digital texts, also the Internet and book bindings and iPhones and mise en page and incunabula and the rest -- need to know and understand that nothing lasts forever... But it sure didn't seem that way, back when all this digital stuff began, not so very long ago, or it didn't about the brand new and shiny "digital" part of it all anyway.

Michael Hart was young, back then -- Sergey and Larry were even younger -- "bliss was it in that dawn to be alive, but to be young was very heaven" -- Bill Gates was young, Steve Jobs was young... If all this is to transition well to future generations, though, we need to acknowledge our own mortalities, make strenuous efforts to record the history, draw from our experiences the lessons we believe we have learned -- as Hervé has drawn them above, here, from Michael Hart and his fascinating Project Gutenberg -- in part because all that history is interesting, and at least so that the most egregious errors and cost and other overruns we have made, and there have been many, might not be made again.

It's all going to change. It has to. That is how Cerf & Kahn's tcp/ip and Hart's Project Gutenberg and GoogleBooks and Kindle and the iPhone and the Internet *cloud* all got to us in the first place: through change and by defying previous oldsters and their older paradigms, received opinions, conventional wisdoms.

But I've read somewhere that Google's gone and purchased their particular Palo Alto "garage"... the place where Sue Wojcicki rented space to Sergey & Larry for their early experiments... So, that's a Good Thing, I believe -- someday we'll all want to know, and be reminded.

And so I hope someone is holding onto what we have from Michael Hart and Project Gutenberg, too. It would be a shame if all our collective memory of this Digital Age's beginnings were to turn out to have been ephemera -- even worse than previous, this is likely to be digital ephemera, and for that it is far too easy to press a wrong button and *global delete*, and then the process of forgetting will commence.

Now, where did I put my glasses...


Jack Kessler, kessler@well.com