xmlsqlitecsvcommand-linexml2

How to import XML by piping from command line?


I'm converting from XML to CSV with xml2 | 2csv tool, then call .import to SQLite. I don't want to generate a CSV file, but want to pipe the converted file from xml2 to the .import query.

I have tags.xml:

<tags>
  <row Id="1" TagName=".net" Count="316293" ExcerptPostId="3624959" WikiPostId="3607476" />
  <row Id="2" TagName="html" Count="1116853" ExcerptPostId="3673183" WikiPostId="3673182" />
  <row Id="3" TagName="javascript" Count="2343663" ExcerptPostId="3624960" WikiPostId="3607052" />
</tags>

So instead of doing this conversion:

xml2 | 2csv row @Id @TagName @Count @ExcerptPostId @WikiPostId < tags.xml > tags.csv

Then importing to database from command line:

sqlite3 mydb.db -csv ".import tags.csv tags"

I want something like this from the command line:

sqlite3 mydb.db -csv ".import <xml2 | 2csv command> tags"

My XML file is around 100GB so I don't want to materialize it.


Solution

  • I do the same with a 90 GB file, here is how I import it:

    xml2 < Posts.xml | 2csv row @Id @PostTypeId @AcceptedAnswerId @CreationDate \
                @Score @ViewCount @Body @OwnerUserId @LastEditorUserId \
                @LastEditorDisplayName @LastEditDate @LastActivityDate \
                @Title @Tags @AnswerCount @CommentCount @FavoriteCount \
                @CommunityOwnedDate @ContentLicense | \
    sqlite3 stack.db -csv ".import /dev/stdin posts"
    

    The conversion with xml2 happens via pipes, and the stdout of 2csv is passed directly into the stdin of sqlite3