jsonpandasdataframedatatabletabular-form

Converting .txt file to load into dataframe


I have a text file (.txt) something like this:

{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-08-13T14:27:32", "transactionAmount": 98.55, "merchantName": "Uber", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "02", "posConditionCode": "01", "merchantCategoryCode": "rideshare", "currentExpDate": "06/2023", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "414", "enteredCVV": "414", "cardLast4Digits": "1803", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-10-11T05:05:54", "transactionAmount": 74.51, "merchantName": "AMC #191138", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "entertainment", "cardPresent": true, "currentExpDate": "02/2024", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-11-08T09:18:39", "transactionAmount": 7.47, "merchantName": "Play Store", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "mobileapps", "currentExpDate": "08/2025", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-12-10T02:14:50", "transactionAmount": 7.47, "merchantName": "Play Store", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "mobileapps", "currentExpDate": "08/2025", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
.
.
.
.
.
.
.

How to load it into a dataframe?


Solution

  • Every line looks to be a separate json object. Using just Python:

    import pandas as pd
    import json
    
    with open('data.json') as f:
        lines = f.readlines()
    
    data = []
    for line in lines:
        data.append(json.loads(line))
    
    df = pd.DataFrame(data)
    
    df
    

    Looks like

    accountNumber   customerId  creditLimit availableMoney  transactionDateTime transactionAmount   merchantName    acqCountry  merchantCountryCode posEntryMode    posConditionCode    merchantCategoryCode    currentExpDate  accountOpenDate dateOfLastAddressChange cardCVV enteredCVV  cardLast4Digits transactionType echoBuffer  currentBalance  merchantCity    merchantState   merchantZip cardPresent posOnPremises   recurringAuthInd    expirationDateKeyInMatch    isFraud
    0   737265056   737265056   5000.0  5000.0  2016-08-13T14:27:32 98.55   Uber    US  US  02  01  rideshare   06/2023 2015-03-14  2015-03-14  414 414 1803    PURCHASE        0.0             False           False   False
    1   737265056   737265056   5000.0  5000.0  2016-10-11T05:05:54 74.51   AMC #191138 US  US  09  01  entertainment   02/2024 2015-03-14  2015-03-14  486 486 767 PURCHASE        0.0             True            False   False
    2   737265056   737265056   5000.0  5000.0  2016-11-08T09:18:39 7.47    Play Store  US  US  09  01  mobileapps  08/2025 2015-03-14  2015-03-14  486 486 767 PURCHASE        0.0             False           False   False
    3   737265056   737265056   5000.0  5000.0  2016-12-10T02:14:50 7.47    Play Store  US  US  09  01  mobileapps  08/2025 2015-03-14  2015-03-14  486 486 767 PURCHASE        0.0             False           False   False
    

    If the file were just one json object rather than a new json object every line, you could just use pandas.read_json(file_path)