Don’t Stop Too Soon:

You and the GENTECH Genealogical Data Model

 

My Little Corner of the Universe

My cousin Bob’s first experience that with computer genealogy was to sit down on a sunny Saturday morning, hot coffee near at hand. He sat before a computer screen, ready to explore and document his heritage. The first screen sat dumbly and prompted for a person’s NAME. Like this:

NAME:

As is the case more often than not these days, the screen was very pretty, very well designed for packing everything into 15 inches of monitor. There were colors and sepia background pictures of old people and jazzy fonts and even sounds of doors opening and closing. The computer is a great machine for storing information and as Bob stared at the screen, he fully expected that what he typed would echo down the halls of time forever. Each letter, word, name, date, and place would be preserved for a digital eternity, available to those who come after, as if the information he typed was a lantern for others to find The Way.  Jack in, turn on, be real. Bob was about to be very disappointed.

NAME:

The first NAME: that Bob wanted to enter was his great grandfather’s. Bob never knew Bill Sharbrough, but he had come across some vital records and had heard a few stories. The biggest problem was that Bill was not really named Bill. He was named Calvin Bryan Sharbrough, but no one ever called him anything but Bill because as a young man he was apparently compared favorably to “Wild Bill.” As often happens with names, it stuck. Cousin Bob sat at the computer, looking at Bill’s birth certificate, and thought that it was somehow backward to enter the name first.  Some records said Calvin and some said Bill, and Bob wasn’t sure how to clear up the confusion, as he stared at the screen and the word NAME.

NAME:

 

How should he enter it?
What was everyone else doing?

How was he going to get this data onto the information highway so that it could be included in the Digital History of the Human Race? Where, in those timeless halls of digital eternity, were the rules for getting this stuff into the computer?

The truth is that I don’t have a Cousin Bob and Bill was my uncle. But the problem is not an unusual one. Genealogists want to participate in a great project, one without a name and an organizing authority: the digitization of the whole of human history. We want to know our place in the Family Tree of Man. I personally want to find a link to Scott Joplin someday and silence my critics. But as is so often the case in life, the Way to Truth is a narrow one, and is not easy to find. It must be a path that wasn’t taken by many people before us, or you would just be able to follow the beer cans to Perfect Knowledge.

Seriously, I’m not on anything but coffee. But I don’t understand how the credit card companies and the insurance companies can build better databases than genealogists can. I don’t know all of the answers, but I contend that the GENTECH Genealogical Data Model (GDM) is a first step toward creating a Digital Library of Man. There will be such a library someday. A child will be able to wander through it, just as they would wander through a physical library with books on the shelves, and learn about her heritage as easily as they might chase a butterfly through a field. My kids won’t, but their kids might.

The GDM maps the activities of genealogists into three broad categories: EVIDENCE, CONCLUSIONS, and ADMINISTRATION. While this division might seem simple to you, please remember the software that you had to use in 1995. Sheesh. There was no place for anything but CONCLUSIONS unless you typed your sources into NOTES.

Cousin Bob had a problem because the computer program didn’t work like genealogists work. Genealogists start with EVIDENCE, and the computer programs available today start with CONCLUSIONS. The process of creating CONCLUSIONS from EVIDENCE, according to the GENTECH Genealogical Data Model, is by making ASSERTIONS.

Forget about computers. Remember when everyone did their research and kept their records using paper. A man was born and died and all that remained of him was a set of vital records, some census entries, a gravestone and a letter or two. As a family historian, you would assemble lots of those records and try to see which ones went with what people. When you grouped them together for a person, you were ASSERTING a CONCLUSION about a PERSONA, in GDM terms.

 

 

 

 


 

 


 


What IS it?

The GDM is a Request For Comment (in Internet parlance an "RFC"), an invitation to a discussion. Genealogists and developers created the GDM. It's not a standards document, in the classical sense. It's a suggestion for describing genealogy processes. It describes the relationships between the various kinds of family history information.

It's NOT a genealogy program. It's NOT a database design. It's NOT a document saying what genealogists SHOULD do.

 

Every genealogist says that they do research differently.

The GDM describes the processes that they do differently.

 

The GDM is a product of cooperation by GENTECH, FGS, NGS, NEHGS, BCG and APG. You'd think that everybody liked it, but that's not true. There are a number of reasons. Some of the discussion seems like we're arguing about how many angels can dance on the head of a pin, or whether to count ballots without votes on them.

What’s wrong with it? Well, for starters, it doesn't tell a developer how to relate parts of names and different names. It doesn't describe relationships between places, or dates. The three most important pieces of every evidentiary citation are thrown into tidy black boxes and left for another day.

 

That said, it is not without its practical uses. The following outline shows some of the parts of administration, evidence, and conclusion.

 

Practical Uses – ADMINISTRATION

 

·                     Planning your projects - research objectives and activities

·                     Defining surety schemes

·                     Defining source groups

 

Administration Example - Look for Jonathan Sharbrough estate papers at Family History Library in Salt Lake City.

 

In GDM terms, I’ve defined a …

RESEARCH OBJECTIVE,

RESEARCH ACTIVITY,

SOURCE GROUP,

RESEARCHER, and

PROJECT

 

Practical Uses - EVIDENCE

·         REPOSITORY

·         SOURCE

·         REPRESENTATION TYPE

·         REPRESENTATION

·         CITATION

 

EVIDENCE EXAMPLE - I find Jonathan Sharbrough’s Estate Papers on microfilm at FHL.

 

I defined a REPOSITORY, a SOURCE, and a REPRESENTATION.

 

 

Practical Uses - CONCLUSIONS

ASSERTIONS about …

PERSONA

EVENTS

CHARACTERISTICS

GROUPS

ASSERTIONS

 

CONCLUSION EXAMPLE

ASSERTION: Joseph was Jonathan’s son.

ASSERTION: Jehu was Jonathan’s son.

 

Still, this long after the introduction of GDM 1.0, the discussion is hurting for concrete examples. We need people to try to produce them for common genealogical activities. In order to further that goal, I present a challenge to the readers of this article. If you will work on examples and send them to me, and they truly demonstrate the model, I’ll publish them in a subsequent article, on the GENTECH website, and attribute them in the lectures that I give on the topic.

 

CHALLENGE: Describe common research activities in terms of the entities in the model. For example:

•Census

•Will

•Deed

•Bible

•Funeral Program

 

 

 

 

What does the GDM give us, potentially, that we didn’t have before? Here are a few ideas that you might use.

 

Stop Starting with Conclusions

      Don’t start with conclusions, start with evidence.

This is a powerful message. The available software has forced us to abandon our normal process of research long enough. Please tell the developer of your favorite program to let you distinguish between your evidence and your conclusions.

 

Sharing your thought process

      Show your work!

I don’t of a product on the market today that shows the train of thought connecting a source to a conclusion. Wouldn’t it be great if we could tell, so when we find an error, we can identify it and correct it?

 

GroupWork

      When is it a source and when is it a conclusion?

One man’s conclusion is another man’s source.

 

Would you like to team up with other genealogists on a project? How could you tell which person read which sources and formed which conclusions? Wouldn’t it be great if your grandchildren could build onto your work?

 

 

The GDM presents an opportunity for us to change the dialog about computer genealogy. Now, we can focus on questions such as:

      What was your evidence?

      What did you assert?

 

Data Modeling means better understanding of genealogy processes.


What are we doing now?

The parts of the GDM lend themselves to markup. Markup is the name for a process of “marking up” text with tags that say what the pieces of text are. The goal of the current Lexicon Working Group Project, Lexicon2, is to create a set of XML tags that are consistent with the GDM, called LexML.

XML is eXtended Markup Language

 

Here are a few examples of some text, and how it looks when it’s marked up.

 

      <TITLE>The Title of My Book</TITLE>

      <NAME>Jonathan Sharbrough</NAME>

      <BIRTHDATE>circa 1734</BIRTHDATE>

      <BIRTH PLACE=North Carolina DATE=circa 1734 </BIRTH>

 

You might wonder why we would choose XML as our direction. There are several reasons. They include:

·        XML is easy to output from genealogy programs

·        XML is easy to search on web pages

·        XML means better web searches for genealogy information.


 

What’s Next?

 

At some point in the future, we might find that:

·        programs publish pedigrees and registers in LexML format

·        repositories publish records  in the same format

·        our programs store local links to remote sources of external authorities, such as census or vital records.

If that were to happen, there would be some sites with lots of links coming in to them, and some sites with lots of links coming out of them. The most quoted sites would be called “authorities” and might be like the FamilySearch site today. The sites with many links coming out of them would be called “hubs,” and Cyndi’s List is a great example of that. The sites that linked together most closely would define culture, tribe, and family, in a networking sense.

 

The digital future of family history is a virtual library where it is ...

      Easy to find the conclusions

      Easy to identify the evidence

      Easy to identify the thought process that links them.

 

That future is not yet here, there are some missing ingredients. First, we need to have widespread agreement on a LexML standard. Second, we will need to have wide acceptance of that agreed LexML standard. And finally, we will need wide implementation of LexML