
GEDCOM, an acronym for GEnealogical Data COMmunication, is a specification for exchanging genealogical data between different genealogy software. GEDCOM was developed by The Church of Jesus Christ of Latter-day Saints as an aid to genealogical research.
A GEDCOM file is plain text (usually either ANSEL or ASCII) containing genealogical information about individuals, and meta data linking these records together. Most genealogy software supports importing from and/or exporting to GEDCOM format. However, some genealogy software programs incorporate the use of proprietary extensions to the GEDCOM format, which are not always recognized by other genealogy programs. The [https://www.ngsgenealogy.org/ngsgentech/projects/TestBook2001/index.cfm GEDCOM TestBook Project] evaluates how well [https://www.ngsgenealogy.org/ngsgentech/projects/TestBook2001/sumchart.cfm popular genealogy programs] conform to the GEDCOM 5.5 standard. Additionally, many tools exist to convert GEDCOM files to HTML pages.
GEDCOM uses a lineage-linked data model. This data model is based on the nuclear family and the individual. This contrasts with evidence models, where data is structured to reflect the discovered and supporting evidence. In the GEDCOM lineage-linked data model, all data is structured to reflect the believed reality, that is, actual (or hypothesized) nuclear families and individuals.
Commsofthttp://sonic.net/~commsoft/rstory.html, the authors of the Roots series of genealogy software and Ultimate Family Tree, defined a version called Event GEDCOM http://archiver.rootsweb.com/th/read/TMG/2000-06/0962255126. Although it is event based it is still a model built on assumed reality rather than evidence. Event GEDCOM was more flexibile as it allowed some separation between believed events and the participants. Roots and Ultimate Family Tree are no longer available, so now very few people are using Event GEDCOM.
GEDCOM files are somewhat similar to MARC, an interchange format for bibliographic data.
A GEDCOM file consists of a header section, records, and a trailer section.
Records represent people (INDI record), families (FAM records), sources of information (SOUR records), and other miscellaneous records, including notes.
Every line of a GEDCOM file begins with a level number. All top-level records (HEAD, TRLR, SUBN, and each INDI, FAM, OBJE, NOTE, REPO, SOUR, and SUBM) begin with a line with level 0. All other level numbers are positive integers. Although it is theoretically possible to write a GEDCOM file by hand, the format was designed to be used with software and thus is not especially human-friendly. A GEDCOM validator that can be used to validate the structure of a GEDCOM file is included as part of PhpGedView project, though it is not meant to be a standalone validator.
The following is a sample GEDCOM file. The first column indicates an indentation level.
The header (HEAD) includes the source program and version (Reunion, V8.0), the GEDCOM version (5.5), and the character encoding (MACINTOSH).
The individual records (INDI) define Bob Cox(ID 1—@I1@), Joann Para (ID 2), and Bobby Jo Cox (ID 3).
The family record (FAM) links the husband (HUSB), wife (WIFE), and child (CHIL) by their ID numbers.
0 HEAD
1 SOUR Reunion 2 VERS V8.0
2 CORP Leister Productions 1 DEST Reunion
1 DATE 11 FEB 2006 1 FILE test
1 GEDC 2 VERS 5.5
1 CHAR MACINTOSH 0 @I1@ INDI
1 NAME Bob /Cox/ 1 SEX M
1 FAMS @F1@ 1 CHAN
2 DATE 11 FEB 2006 0 @I2@ INDI
1 NAME Joann /Para/ 1 SEX F
1 FAMS @F1@ 1 CHAN
2 DATE 11 FEB 2006 0 @I3@ INDI
1 NAME Bobby Jo /Cox/ 1 SEX M
1 FAMC @F1@ 1 CHAN
2 DATE 11 FEB 2006 0 @F1@ FAM
1 HUSB @I1@ 1 WIFE @I2@
1 MARR 1 CHIL @I3@
0 TRLR
The current version of the specification is GEDCOM 5.5, which was released on 12 January, 1996. A subsequent draft GEDCOM 5.5.1 specification was issued in 1999, introducing nine new tags, including WWW, EMAIL and FACT, and adding UTF-8 as an approved character encoding. This draft has not been formally approved, but its provisions have been adopted in some part by a number of genealogy programs.
As mentioned above, there was also a version (at least a beta version) of "Event GEDCOM", which included events as first class (zero-level) items. However, this has not been widely adopted, and the lineage-linked GEDCOM is still the de facto common denominator.
On January 23, 2002 a beta version of GEDCOM 6.0 was released for developers to study and begin to implement in their software.[1] GEDCOM 6.0 was to be the first version to store data in XML format, and was to change the preferred character set from ANSEL to Unicode. (Uniform use of Unicode would allow for the usage of international character sets. An example is the storage of East Asian names in their original CJK characters, without which they could be ambiguous and of little use for genealogical or historical research.)
As of 2007, five years after the publication of the beta version of GEDCOM 6.0, no genealogical software suppplier supports it, despite the inherent advantages of an extensible and portable language like XML and its multi-lingual Unicode support. The few exceptions GedXML are listed in reference.
The relationship-based file structure stores events as details under individual and family records. This means that some events are more difficult to organize and elaborate, and leads to ambiguities about which record should "own" an event. For example, the record for adoption details could be associated with the child, the adopted parents, the birth parents, or the family of which the child becomes part.
GEDCOM is philosophically oriented towards describing simple, non-conflicting factual data. Many genealogists would prefer a model oriented more to documenting evidence, and separating the steps of collecting data from the steps of making deductions as to underlying history. GEDCOM does include one primitive attribute (QUAY) for reliability, but aside from that, it provides no guidance for handling conflicting data, and deductions amongst related data.
The representation of names is not very flexible or detailed. Provision is not given for entering names in different scripts (eg, Hangul and Hanja, and perhaps Latin, for Korean names). The standard does not provide strong detail in how to use the name tags which it includes.