I need some help in this.. I have 3 external tables:
create external table ext.titanic
(
PassengerId INT,
Pclass INT,
Pname VARCHAR(100),
Gender VARCHAR(20),
Ticket VARCHAR(30),
Cabin VARCHAR(30)
)
WITH (LOCATION='/titanic.csv',
DATA_SOURCE = blob1,
FILE_FORMAT = TextFileFormat1,
);
create external table ext.titanic2
(
Pclass INT,
Pname VARCHAR(100)
)
WITH (LOCATION='/titanic2.csv',
DATA_SOURCE = blob1,
FILE_FORMAT = TextFileFormat1,
);
create external table ext.titanic3
(
PassengerId INT,
Pname VARCHAR(100),
)
WITH (LOCATION='/titanic3.csv',
DATA_SOURCE = blob1,
FILE_FORMAT = TextFileFormat1,
);
and i have dbo table created:
CREATE TABLE dbo.titanic
WITH
(
DISTRIBUTION = ROUND_ROBIN
)
AS
SELECT
titanic.PassengerId,
titanic.Pclass,
titanic.Pname,
titanic.Gender,
titanic.Ticket,
titanic.Cabin,
titanic3.PassengerId as T3_PassengerId,
titanic3.Pname as T3_Pname,
titanic2.Pclass as T2_Pclass,
titanic2.Pname as T2_Pname
FROM ext.titanic
FULL JOIN ext.titanic2 ON ext.titanic2.PassengerId=ext.titanic.PassengerId
FULL JOIN ext.titanic3 ON ext.titanic3.Pclass=ext.titanic.Pclass;
I have to join them and update the dbo.titanic with a stored procedure
do i need additional ext.table to join them there and after that to merge it with dbo.titanic?
or there is a easy and simple way to do that?
also i need more help for the dbo.titanic and joins..
there are more unique PassengerIds in titanic3 than in titanic, but i need all PassengerIds from the 2 tables to be in one column.. same for Pclass from both tables... that is bugging me
just for reference - titanic table has around 100000 rows(800 unique PassengerIDs) and titanic2 and titanic3 have 5000 unique (total)rows for PassengerId and Pclass.
The final table must look like dbo.titanic but without T3_PassengerId and T2_Pclass as they must be merged somehow in the PassengerId and Pclass.
I lost a lot of time looking for something like that, but didn't find anything close enough.
This is the best I could find:
and I want to thank the guy that wrote this, but to use it, I have 3 main issues :
Can I use something like this
INSERT INTO table1(column1, column2,...) SELECT column1, column2,... FROM table2 WHERE condition( compare value in table1 <> value in table 2)
thanks in advance
I got this...
may be not the most elegant way but it works.. using left join, with additional stg.titanic table (same as dbo.titanic) that combines the 3 external tables.. then merge stg. and dbo. tables..
MERGE dbo.titanic AS [Target]
USING (SELECT
column1,2,3
UpdateTime
from stg.titanic) AS [Source]
ON [Target].PassengerId = [Source].PassengerId
and [Target].Pclass = [Source].Pclass
and [Target].Pname = [Source].Pname --- specifies the condition
WHEN MATCHED THEN
UPDATE SET [Target].UpdateTime = GetDate()
WHEN NOT MATCHED THEN
INSERT (column1,2,3 --- when one of the 3 conditions is not met then insert new row
UpdateTime)
VALUES (
[Source].column1,2,3
[Source].UpdateTime
);
if someone knows a better way it will be good to share with us
Thanks.