azureazure-data-lakeu-sql

How to skip first n rows in U-SQL job?


I want to run a U-SQL job to load the data from .txt file in the SQL table on Azure Data Lake store. I already have created database, schema and table in Azure data lake analytics.

Data in txt file are tab-limited, and I need to skip 2 first row. I think that I should use Extractors.Text() built-in extractor, but how to add skipFirstNRows parameter in it to extract the data ?


Solution

  • You just pass it to the extractor like this:

    @searchlog =
     EXTRACT UserId          int,
             Start           DateTime,
             Region          string,
             Query           string,
             Duration        int?,
             Urls            string,
             ClickedUrls     string
     FROM "/Samples/Data/SearchLog.tsv"
     USING Extractors.Tsv(skipFirstNRows: 2);
    

    I based the example on the TSV extractor as that one defaults to a tab as the delimiter.

    (source)