Don’t Stop Too Soon:
You and the GENTECH
Genealogical Data Model
My cousin Bob’s first
experience that with computer genealogy was to sit down on a sunny Saturday
morning, hot coffee near at hand. He sat before a computer screen, ready to
explore and document his heritage. The first screen sat dumbly and prompted for
a person’s NAME. Like this:
NAME:
As is the case more often than
not these days, the screen was very pretty, very well designed for packing
everything into 15 inches of monitor. There were colors and sepia background
pictures of old people and jazzy fonts and even sounds of doors opening and
closing. The computer is a great machine for storing information and as Bob
stared at the screen, he fully expected that what he typed would echo down the
halls of time forever. Each letter, word, name, date, and place would be
preserved for a digital eternity, available to those who come after, as if the
information he typed was a lantern for others to find The Way.
Jack in, turn on, be real. Bob was about to be very disappointed.
NAME:
The first NAME: that Bob wanted
to enter was his great grandfather’s. Bob never knew Bill Sharbrough, but he
had come across some vital records and had heard a few stories. The biggest
problem was that Bill was not really named Bill. He was named Calvin Bryan
Sharbrough, but no one ever called him anything but Bill because as a young man
he was apparently compared favorably to “Wild Bill.” As often happens with
names, it stuck. Cousin Bob sat at the computer, looking at Bill’s birth
certificate, and thought that it was somehow backward to enter the name first.
Some records said Calvin and some said Bill, and Bob wasn’t sure how to
clear up the confusion, as he stared at the screen and the word NAME.
NAME:
How was he going to get this
data onto the information highway so that it could be included in the Digital
History of the Human Race? Where, in those timeless halls of digital eternity,
were the rules for getting this stuff into the computer?
The truth is that I don’t
have a Cousin Bob and Bill was my uncle. But the problem is not an unusual one.
Genealogists want to participate in a great project, one without a name and an
organizing authority: the digitization of the whole of human history. We want to
know our place in the Family Tree of Man. I personally want to find a link to
Scott Joplin someday and silence my critics. But as is so often the case in
life, the Way to Truth is a narrow one, and is not easy to find. It must be a
path that wasn’t taken by many people before us, or you would just be able to
follow the beer cans to Perfect Knowledge.
Seriously, I’m not on
anything but coffee. But I don’t understand how the credit card companies and
the insurance companies can build better databases than genealogists can. I
don’t know all of the answers, but I contend that the GENTECH Genealogical
Data Model (GDM) is a first step toward creating a Digital Library of Man. There
will be such a library someday. A child will be able to wander through it, just
as they would wander through a physical library with books on the shelves, and
learn about her heritage as easily as they might chase a butterfly through a
field. My kids won’t, but their kids might.
The GDM maps the activities of
genealogists into three broad categories: EVIDENCE, CONCLUSIONS, and
ADMINISTRATION. While this division might seem simple to you, please remember
the software that you had to use in 1995. Sheesh. There was no place for
anything but CONCLUSIONS unless you typed your sources into NOTES.
Cousin Bob had a problem
because the computer program didn’t work like genealogists work. Genealogists
start with EVIDENCE, and the computer programs available today start with
CONCLUSIONS. The process of creating CONCLUSIONS from EVIDENCE, according to the
GENTECH Genealogical Data Model, is by making ASSERTIONS.
Forget about computers.
Remember when everyone did their research and kept their records using paper. A
man was born and died and all that remained of him was a set of vital records,
some census entries, a gravestone and a letter or two. As a family historian,
you would assemble lots of those records and try to see which ones went with
what people. When you grouped them together for a person, you were ASSERTING a
CONCLUSION about a PERSONA, in GDM terms.
The GDM is a Request For Comment (in Internet parlance an "RFC"), an invitation to a discussion. Genealogists and developers created the GDM. It's not a standards document, in the classical sense. It's a suggestion for describing genealogy processes. It describes the relationships between the various kinds of family history information.
It's NOT a genealogy program. It's NOT a database design. It's NOT a document saying what genealogists SHOULD do.
The GDM is a product of cooperation by GENTECH, FGS, NGS, NEHGS, BCG and APG. You'd think that everybody liked it, but that's not true. There are a number of reasons. Some of the discussion seems like we're arguing about how many angels can dance on the head of a pin, or whether to count ballots without votes on them.
What’s wrong with it? Well, for starters, it doesn't tell a developer how to relate parts of names and different names. It doesn't describe relationships between places, or dates. The three most important pieces of every evidentiary citation are thrown into tidy black boxes and left for another day.
That said, it is not without its practical uses. The following outline shows some of the parts of administration, evidence, and conclusion.
· Planning your projects - research objectives and activities
· Defining surety schemes
· Defining source groups
In GDM terms, I’ve defined a …
RESEARCH OBJECTIVE,
RESEARCH ACTIVITY,
SOURCE GROUP,
RESEARCHER, and
PROJECT
· REPOSITORY
· SOURCE
· REPRESENTATION TYPE
· REPRESENTATION
· CITATION
I defined a REPOSITORY, a SOURCE, and a REPRESENTATION.
ASSERTIONS about …
PERSONA
EVENTS
CHARACTERISTICS
GROUPS
ASSERTIONS
ASSERTION: Joseph was Jonathan’s son.
ASSERTION: Jehu was Jonathan’s son.
Still, this long after the introduction of GDM 1.0, the discussion is hurting for concrete examples. We need people to try to produce them for common genealogical activities. In order to further that goal, I present a challenge to the readers of this article. If you will work on examples and send them to me, and they truly demonstrate the model, I’ll publish them in a subsequent article, on the GENTECH website, and attribute them in the lectures that I give on the topic.
What does the GDM give us,
potentially, that we didn’t have before? Here are a few ideas that you might
use.
This is a powerful message. The available software has forced us to abandon our normal process of research long enough. Please tell the developer of your favorite program to let you distinguish between your evidence and your conclusions.
I don’t of a product on the market today that shows the train of thought connecting a source to a conclusion. Wouldn’t it be great if we could tell, so when we find an error, we can identify it and correct it?
Would you like to team up with other genealogists on a project? How could you tell which person read which sources and formed which conclusions? Wouldn’t it be great if your grandchildren could build onto your work?
The GDM presents an opportunity for us to change the dialog about computer genealogy. Now, we can focus on questions such as:
What
are we doing now?
The
parts of the GDM lend themselves to markup. Markup is the name for a process of
“marking up” text with tags that say what the pieces of text are. The goal
of the current Lexicon Working Group Project, Lexicon2, is to create a set of
XML tags that are consistent with the GDM, called LexML.
Here
are a few examples of some text, and how it looks when it’s marked up.
You
might wonder why we would choose XML as our direction. There are several
reasons. They include:
·
XML is
easy to output from genealogy programs
·
XML is
easy to search on web pages
·
XML means
better web searches for genealogy information.
At
some point in the future, we might find that:
·
programs publish pedigrees and registers in LexML format
·
repositories publish records in
the same format
·
our programs store local links to remote sources of external
authorities, such as census or vital records.
If
that were to happen, there would be some sites with lots of links coming in to
them, and some sites with lots of links coming out of them. The most quoted
sites would be called “authorities” and might be like the FamilySearch site
today. The sites with many links coming out of them would be called “hubs,”
and Cyndi’s List is a great example of that. The sites that linked together
most closely would define culture, tribe, and family, in a networking sense.
That
future is not yet here, there are some missing ingredients. First, we need to
have widespread agreement on a LexML standard. Second, we will need to have wide
acceptance of that agreed LexML standard. And finally, we will need wide
implementation of LexML