pythonsqlregexmariadbkey

Delete all Occurences of a Substring in SQL-Statement in Python


I have a file from a mariadb containing 3GBs of SQL-Statements. Problem is, that my SQLlite DB doesn't support the contained Key-Statements.

Is there a way to edit the Strings containing the Statements that cuts out all substrings or statements that contain the word "key"?

I tried the following regex pattern :

\n(.*Key.*)

to filter out the key statements. Any other more efficient way of doing this in Python or with other tools?

Input:

CREATE TABLE Persons (
PersonID int,
LastName varchar(255),
FirstName varchar(255),
Address varchar(255),
City varchar(255),
Primary Key(`PersonID`),
Foreign Key(`City`)
);

Desired Output:

CREATE TABLE Persons (
PersonID int,
LastName varchar(255),
FirstName varchar(255),
Address varchar(255),
City varchar(255)
);

Solution

  • We can use the python's 're' module (for regular expressions) to remove substrings that follow the pattern 'Key'.

    import re
    
    # your input
    input_string = """
    CREATE TABLE Persons (
    PersonID int,
    LastName varchar(255),
    FirstName varchar(255),
    Address varchar(255),
    City varchar(255),
    Primary Key(`PersonID`),
    Foreign Key(`City`)
    );
    """
    
    # pattern to look for 
    pattern = r".*Key\([^)]*\)\s*,|.*Key\([^)]*\)\s*"
    
    # remove all substrings that match the pattern
    output_string = re.sub(pattern, "", input_string)
    
    # print output
    print(output_string)
    

    Output:

    CREATE TABLE Persons (
    PersonID int,
    LastName varchar(255),
    FirstName varchar(255),
    Address varchar(255),
    City varchar(255)
    );
    

    To find out what regular expression you need, you can use this site RegExr to test until you get your needed expression.