OPINION: Where are we going?

What will a post-Census online genealogy research world look like?

Genealogy has been propelled to new heights during the past decade, by the availability of digital databases such as those found on Ancestry.com and Genealogy.com. In my view, this industry is maturing, as evidenced by the wide availability of US census and UK census collections.

Not only does Ancestry.com have a full US census collection. So does Heritage Quest (only available through libraries). Footnote has published 1860 and 1930. I recently saw most of the censuses between 1850 and1900 on FamilySearch.org, with images linked. The fee to access this information on that site is … free.

Ancestry’s recent IPO has apparently resulted in some “me too” decisions by companies to create similar sites. Footnote has announced that they will complete the US census. Other companies are quietly working on digitizing census records. To the genealogy researcher, the impact is that the prices will fall, until it gets all the way down to … free.

And what will happen then? Large genealogy publishers will not want to close up shop. They will need to publish something else. There is a significant amount of immigration and military record data going online.

The titles that publishers digitize fit into two broad categories. Some of them, like census records and civil war pensions, are complete collections in one place. These records have a single records custodian, such as the National Archives. Others, like gravestones, are fragmented, located in dozens, hundreds, or thousands of places. There are many records custodians.

The challenge for publishers is that they are rapidly approaching the point where all of the “complete collections” for markets like the US and UK are online, and the day that they’ll be free is clearly coming. They have little choice but to go forward with fragmented titles.

American researchers have been somewhat spoiled by the availability of Ancestry’s census collection. One web site. Every record. In the UK, it was split over four sites, until recently, when Find My Past completed a UK census collection. UK researchers can counsel their american colleagues about what happens next.

I’ll speculate about three kinds of fragmented US records that I think will be increasingly available: vital records, gravestones, and historical newspapers.

Vital records are kept by the states. This creates increased complexity for publishers, because the rights and the media come from 50 places, instead of one. The truth is, it’s more complicated than that. In some states, the counties are the records custodians. And with states such as Massachusetts, the state only has records after a certain date, and there are unique collections of records in many towns. There is no central index of vital records for the US. That’s one of the attractions of the census – it’s an implied birth record, by showing the child with the parents, and the age of the child at a certain date. There is a nice index of deaths in the SSDI, but it has very few entries for deaths before 1962, and many people did not participate in social security (such as state employees like teachers, and federal employees like … social security workers!).

Gravestones are about as fragmented as a collection of records can be. They are literally all over the US. Many sites, both paid and free of fees, contain some gravestones. There is no centralized index of gravestones. They make a good vital record, often containing information about birth dates, death dates, and even names of relatives.

Newspapers were once the primary means of distributing and receiving news, and there were many more newspapers in the past than there are today. Some of them have been lost. Some of them are in the hands of large publishers today, such as Gannet or Hearst. Many have been collected by NewsBank, and are available to genealogy researchers through GenealogyBank.

There are two kinds of problems with creating a central index over fragmented records. While the first is to achieve any degree of thoroughness, the other is to achieve a degree of consistency. Not all vital records contain the same information. Many existing databases contain partial information extracted from the records, and the parts in one database are not the same parts as are in others.

The same is true of gravestone databases available today.

And newspapers? For the most part, newspapers are OCR’d as a large “text blob” for each page. If you search for “Brown” you will find that the word is used in many contexts, and most of them are not as a family name. If you’re lucky enough to be a Sharbrough, your searches that turn up exact hits are always interesting, but if the OCR quality or the spelling isn’t just right, you will be unable to locate the articles you like. In short, current newspaper processing methods don’t identify name parts, date parts, connections between persons and employers, and the like.

There is a great deal of useful genealogical information left for publishers to sell us. At present, it would seem that they have a difficult decision to make. Perhaps it would be helpful if researchers could tell publishers more about what kinds of records they want most, and in what format they want to find, organize, and share them.

3 Responses to “OPINION: Where are we going?”

  1. [...] the US census collection becomes freely available. I blogged about it over at the RootsWorks blog [link]. If you are interested, or have comments on the topic, that site is devoted to technology and [...]

  2. Hi Beau,

    I hate to remind tell you but your 2nd paragraph contains some untrue information. I’m STILL waiting for one of the online genealogy database entities to complete the digitization of all the census images. So far, none has. There are other census documents that have been microfilmed by NARA but which have not been made available in a digitized format. These include agricultural schedules, manufacturing/industry schedules, the Defective, Dependent and Delinquent Classes schedules of the 1880 census, social statistics, the Indian schedules, the enumeration district descriptions and maps, and a few other documents. Only then will we have a complete and comprehensive collection of U.S. federal census materials for our research.

    I’ve nagged at Ancestry.com for years to “complete” the census collection, but they have not done so. Nor has HeritageQuest, Footnote.com, Familysearch.org, or any other organization approached these important documents. Not everything of importance is on population schedules. Agricultural schedules, for example, paint a detailed picture of a farming family’s livestock, crops, lumber production, mining, and even how much butter and honey they produced. What a tremendous contextual view into the farming family’s life!

    Until 100% of the census documents have been digitized, I still visit libraries, the Family history Library, the Allen County Public Library, LDS Family History Centers, NARA, and even NARA when I can.

    The census digitization is still far from complete. Help spread the word and exert whatever pressure you can as well!

    Thanks!

    George

  3. sharbrough says:

    If you check out the post entitled “Beyond the Free Census” [link] you’ll see that I agree with you. Most people don’t know that the population schedules are not the entire census. Most publishers like it that way.

Leave a Reply