[This Article Originally appeared in Genealogical Computing]
While it's a mature specification, the use of GEDCOM by genealogists is not dead. Several current projects aim to improve the way that GEDCOM is used by family historians for transfers of their data sets. There is an email mailing list, GEDCOM-L, devoted to the topic, though it doesn't see much activity. See the sidebar for a link to information about this list.
GEDCOM is an acronym for Genealogical Data Communication. Technically, it is a specification for a lineage-linked database. In practice, the word GEDCOM is also used to refer to data files whose formats follow the specification.
GEDCOM is15 years old, and just about everyone has heard of it. Every genealogical software program reads and writes GEDCOM files. A search of the internet for "GEDCOM" produced 100,000 hits. For comparison, a search for "Genealogical Computing produced less than 9,000 hits. There are more GEDCOM files on the internet than you can shake a stick at. People who haven't heard of GEDCOM are also often seen watching in rapt attention, open-mouthed, as flight attendants demonstrate the basic operation of the seat belt. You can hear them saying, "Aha! That's exactly what I thought that thing was!" Still, hearing about it and understanding it are, as we say in the South, altogether entirely different things.
The Family History Department of the Church of Jesus Christ
of Latter Day Saints is the creator and owner of the GEDCOM specification. The
original purpose of the specification was for the submission of information for
temple sacraments. Version 1.0 came
out in 1985, and the current version, 5.5, came out in January of 1996. Please
notice that this date is less than one year after the invention of the World
Duct tape is holding the South together. While it was originally made for taping ventilation ducts, it has been adapted to many other uses. Texans find most of them to be the highest forms of cultural expression. People often joke about the variety of situations in which we'll say, "Duck it," which is Texan for "Repair it with duct tape." Nowadays you can get it in a variety of colors that one suspects is an idea to increase its appeal as a general household repair material in the eyes of the fairer sex.
According to Robert Burns, the best laid plans of mice and men gang aft aglay. This is also the fate of many GEDCOM transfers. In general, the "BMDB" (birth, marriage, death, and burial) information is effectively transferred from every program to every program. The biological relationships of parents and children are also handled well by current products. When it comes to information beyond that point, the effectiveness of GEDCOM varies greatly. The GENTECH GEDCOM TestBook project offers more details. This project needs more volunteers and more versions of programs in order to become a really useful resource.
A good story tells who, what, when, where, why, and how. History is no exception to this rule. Please notice that three of these elements involve that Unholy Trinity of genealogical data problems: Names, Dates, and Places. There is no standard -- in or out of GEDCOM -- that dictates how these elements are to be entered or stored. It is difficult for us to tell a good story about our ancestors and store those deeds and woe in a GEDCOM format.
The problems are not limited to the content of the file. Some problems result from the form of our sources. As time goes by, people find new things that they'd like to put into their databases. Multimedia objects, for example, make up a large part of our computer records. We have images that we've scanned, downloaded, and emailed. We have correspondence and emails that support our thought processes. We have digital audio and video, links to websites, and more. Many of these elements have been created after the most recent version of GEDCOM was released, so we adapt the spec to our own purposes. When we throw the kitchen sink into the GEDCOM suitcase, its seams bulge. We duck it, using GEDCOM colored tape. The women are not fooled.
This would be a good time to suggest that, when exporting, users not put information about living people on line, and that if asked to remove such information, that they quickly comply.
The management of GEDCOM goes back to purpose. GEDCOM works perfectly -- for what it is designed to do. That is to communicate the specific information required for Mormon sacraments. It's not a surprise that the Church does not have a religious requirement for storing certain types of life events and citations such as census information, tax rolls, and military records. Despite the name, GEDCOM is not a genealogical specification -- it's a religious database.
As of this writing, there is no specification for a strictly genealogical database. Such a document does not exist partly because there is no body of authority to create, endorse, and manage a specification. Before concluding that the situation is dire and hopeless, it would be good to point out that exchanging data is a fancy way of sharing hints anyhow - hints about where to research next - and GEDCOM works very well for that purpose.
Convention is a powerful tool. It requires cooperation, communication, and common goals. There are a growing number of websites containing how-to tips, creating a body of conventional use.
One of those is the GENTECH GEDCOM TestBook, a GEDCOM transfer clearinghouse project. The current Project Manager is Prof Evan Ivie of BYU. Exchange of a standard story from one program to another is tested and reported. Users and developers can submit comments about workarounds for problems they have. The results of the exchange projects are published on the GENTECH website.
Beyond that, there are numerous other web how-to points of interest that address GEDCOM issues, including Rootsweb, which contains their WorldConnect project. As of this writing, that project contains almost 28 million names, and is growing at a rate of approximately 3 million names each month.
Also, one can find articles about GEDCOM online written by
Jan McClintock, Dick Eastman, and myself.
In a technical sense, GEDCOM is dead. It will be replaced, and soon, by XML (extensible markup language), which is related to HTML (hypertext markup language), the language of the World Wide Web. A new standard will be developed, and soon, that will gain acceptance as the format for publishing genealogical information on the internet.
Markup languages have tags around text, such as <NAME>John Smith</NAME>. They generally feature an opening and a closing tag, in neat pairs. It is my belief that this format will gain wide acceptance, not only as a method for publishing on the internet, but for exchanging the data directly.
One interesting XML project is the Text Encoding Initiative. Quoting from the web site, "The Text Encoding Initiative (TEI) is an international project to develop guidelines for the preparation and interchange of electronic texts for scholarly research, and to satisfy a broad range of uses by the language industries more generally." What it DOES do is provide a way to code names, dates, and places. One can adopt it in a variety of ways. A general tag can identify its contents as a name, whole and unbroken. A detailed nest of tags might identify all of the name parts such as first names and last names and titles and so on.
Imagine how much easier it would be for a search engine to locate citations for Andersons born in Massachusetts before 1700 if they can look for tags instead of having to read the text and guess which information is the birth date and place, and which name they belong to.
Michael Kay has published GEDML, an XML version of GEDCOM. The GENTECH Lexicon Working Group is developing LexML, which is a combination of the elements found in the GENTECH Genealogical Data Model and the Text Encoding Initiative (TEI). It is very likely that the Family History Department is developing an XML version of GEDCOM. It would be unfortunate for users if these versions were not compatible.
It appears that it is much easier to publish a standard than it is to manage one, and that this is especially true in family history. Management of a standard involves coordinating changes to it, testing and certifying compliance with it, and juggling the various interests of the producers and consumers. The Family History Department has not, historically, performed these functions. GENTECH, or any other group of volunteers, is unlikely to have the manpower to do so either. This is the biggest challenge for the leadership of the family history community -- to find a way to manage a standard to support the exchange of family history information. It won't likely be fixed with a little tape.
GEDCOM has been put to a number of uses that its framers did not anticipate. While it can be made useful for those purposes, it is unlikely that the Family History Department will be inclined to spend time and money to support those uses, especially changing the specification to accommodate new data requirements for historians. While many genealogists find the rapid spread of colored duct tape tastless, reasonable choices are as scarce as calm people on Jerry Springer. We're left needing GEDCOM, and relying on volunteer projects like the GEDCOM TestBook to best exchange our stories.
And that's not so bad. What if we were waiting on Microsoft to fix it?
· The current specification (ver 5.5, 2 Jan 96)- GENDEX
· An article by Jan McClintock
· An article by Dick Eastman
· Michael Kay - , mirrored at Oasis
· TEI, the Text Encoding Initiative
· XML - and many other sites.
· GEDCOM-L mailing list - http://www.rootsweb.com/~nozell/gedcom-l/
· The GEDCOM TestBook Project - http://www.gentech.org/gedtest.htm