I am trying to extract all the column names from a given SQL query ( MySQL to start with). I am using sqlparser module in python.
I have a simple code to fetch the columns until the "from" keyword is reached. How do I get column names from rest of the query.
def parse_sql_columns(sql):
columns = []
parsed = sqlparse.parse(sql)
stmt = parsed[0]
for token in stmt.tokens:
if isinstance(token, IdentifierList):
for identifier in token.get_identifiers():
columns.append(str(identifier))
if isinstance(token, Identifier):
columns.append(str(token))
if token.ttype is Keyword: # from
break
return columns
sample query:
string2 = "SELECT test, ru.iuserid AS passengerid, ru.vimgname FROM ratings_user_driver AS rate LEFT JOIN trips AS tr ON tr.itripid = rate.itripid LEFT JOIN register_user AS ru ON ru.iuserid = tr.iuserid WHERE tr.idriverid='5083' AND tr.iactive='Finished' AND tr.ehailtrip='No' AND rate.eusertype='Passenger' ORDER BY tr.itripid DESC LIMIT 0,10;"
expected output:
["test", "ru.iuserid AS passengerid", "ru.vimgname", "tr.itripid", "rate.itripid","ru.iuserid", "tr.iuserid","tr.idriverid", "tr.iactive", "tr.ehailtrip", "rate.eusertype", "tr.itripid"]
One other issue that I am facing is that "where clause is not treated correctly by this parser. It is not identified as a keyword and due to that, i'm unable to extract information from it clearly
You can use SQLGlot to do this.
import sqlglot
import sqlglot.expressions as exp
for column in sqlglot.parse_one(sql).find_all(exp.Column):
print(column.sql())