I am a Hobby Xojo-User. I wanna import a Gedcom-File to my Program, espacially to a SQLite-Database.
- ID: Integer
- Gender: Varchar // M, F or U
- Surname: Varchar
- Givenname: Varchar
- ID: Integer
- Husband: Integer
- Wife: Integer
- ID: Integer
- PersonID: Integer
- FamilyID: Integer
- Order: Integer
- ID: Integer
- PersonID: Integer
- EventType: Varchar // e.g. BIRT, DEAT, BURI, CHR
- Date: Varchar
- Description: Varchar
- Order: Integer
- ID: Integer
- RelationshipID: Integer
- EventType: Varchar // e.g. MARR, DIV, DIVF
- Date: Varchar
- Description: Integer
- Order: Integer
I wrote a working Gedcom-Line-Parser. He splits a single Gedcomline into:
- Level As Integer
- Reference As String // optional
- Tag As String
- Value As String // optional
I load the Gedcom-File via TextInputStream (working fine). No i need to parse every Line.
0 @I1@ INDI
1 NAME George /Clooney/
2 GIVN George
2 SURN Clooney
1 BIRT
2 DATE 6 MAY 1961
2 PLAC Lexington, Fayette County, Kentucky, USA
You'll see, the Level-Numbers shows us a "Tree-Structure". So i thought it would be the best and simplest way to parse the File into separated Objects (PersonObj, RelationshipObj, EventObj etc.) into a JSONItem, because there its easy to get the Childs of a Node. Later on, i can simple read the Nodes, Child-Nodes to create the Database-Entries. But i don't know how to create such an Algorithm.
Can anyone help my please?
To parse the Gedcom lines with a good speed, try these ideas:
Read the entire file into a String and split the lines up:
dim f as FolderItem = ...
dim fileContent as String = TextInputStream.Open(f).ReadAll
fileContent = fileContent.DefineEncoding (Encodings.WindowsLatin1)
dim lines() as String = ReplaceLineEndings(fileContent,EndOfLine).Split(EndOfLine)
Parse every line using RegEx to extract its 3 columns
dim re as new RegEx
re.SearchPattern = "^(\d+) ([^ ]+)(.*)$"
for each line as String in lines
dim rm as RegExMatch = re.Search (line)
if rm = nil then
// nothing found in this line. Is this correct?
break
continue // -> onward with next line
end
dim level as Integer = rm.SubExpressionString(1).Val
dim code as String = rm.SubExpressionString(2)
dim value as String = rm.SubExpressionString(3).Trim
... process the level, code and value
next
The RegEx search pattern means that it looks for the start of the line ("^"), then for one or more digits ("\d"), a blank, one or more non-blank chars ("[^ ]"), and finally any more chars (".") before the end of the string ("$"). The parentheses around each of these groups is for extracting their results with SubExpression() then.
The check for rm = nil hits whenever the line does not contain at least a number, a blank and at least one more character. If the Gedcom file is malformed or has blank lines, this may be the case.
Hope this helps.