I have been trying to run this query to find matching accounts and every time the query is ran the HUE environment gets to 75% done and then just stays there. I am not sure how to troubleshoot as I have been reading forums and such trying to figure out what is the issue. I can't use COMPUTE STATS as views are not allowed.
SELECT
d.eid,
d.loyaltyProgramId,
d.playerAccountNumber,
dcp1.PhoneNumber,
dcp1.IsPrimary,
dcp1.IsPreferredContactNumber,
d.firstName,
d.LastName,
d.BirthDate,
d.Gender,
d.IsBanned,
d.bancode,
dc2_cust.eid,
dc2_cust.loyaltyprogramid,
dc2_cust.playerAccountNumber,
dcp2_cust.PhoneNumber,
dc2_cust.FirstName,
dc2_cust.LastName,
dc2_cust.BirthDate,
dc2_cust.Gender,
dc2_cust.IsBanned,
dc2_cust.BanCode,
CONCAT(d.playeraccountnumber, '-', d.LoyaltyProgramId, ',', dc2_cust.playeraccountnumber, '-', dc2_cust.loyaltyprogramid) AS killkey
FROM gmscompliance_ref.ballybi_dcustomer d
JOIN gmscompliance_ref.ballybi_dcustomerphone dcp1 ON d.customerkey = dcp1.customerkey AND d.loyaltyprogramid = dcp1.loyaltyprogramid
JOIN gmscompliance_ref.ballybi_dcustomerphone dcp2_cust ON dcp1.customerkey < dcp2_cust.customerkey and (translate(dcp1.PhoneNumber, '-', ' ') = translate(dcp2_cust.PhoneNumber, '-', ' ') OR dcp1.PhoneNumber = dcp2_cust.PhoneNumber)
JOIN gmscompliance_ref.ballybi_dcustomer dc2_cust ON dcp2_cust.customerkey = dc2_cust.customerkey AND dcp2_cust.loyaltyprogramid = dc2_cust.loyaltyprogramid
WHERE
d.PlayerAccountStatus = 'Active'
AND dc2_cust.PlayerAccountStatus = 'Active'
AND d.eid <> 0
AND d.LoyaltyProgramId <> 'GEO'
AND d.FirstName = dc2_cust.FirstName
AND d.eid <> dc2_cust.Eid
ORDER BY d.eid;
With impala on hue, the status bar says 75% is the time it took impala to read the file, not 75% of all the steps. Try going to the query plan section (Click to the query id on the right of status bar, eg: fb4a404290538b7d:9c46dee100000000 ) to see which step is slow. With your query, I think you should pay attention to the condition that joining the table "dcp2_cust" can lead to a data explosion.