I am collecting data from each month on measured parts into a single file for the whole year. When using a "query from folder" I am able to get all the data together, formatted, and sorted with one exception. Every part has an "A" and a "B" version. Unfortunately, due to production order, sometimes the "B" part is measured before the "A" part. In this case I would not want to sort by time as the order would then go, for example, A,B,A,B,B,A,A,B,A,B. I want it to always place the "A" part before the "B" part. Parts are measured twice per day so I cannot sort by day and then part letter because it would then go, for example, A,A,B,B,A,A,B,B. How can I sort the data such that it goes by day, then time, then overwrites time to keep the A,B,A,B pattern?
To further complicate things, sometimes the data collector messes up and mislabels one of the parts. In this case it would sort as, for example, A,B,A,B,A,A,A,B,A,B. How can I find this error and correct it automatically before pasting the consolidated data into a table.
(Data has been over simplified for confidentiality reasons)
You can see on May 2nd in the morning A/B are reversed because B data was taken before A data. Sorting the data by time messed up the order.
You can see on April 2nd in the morning (1PM is morning shift) there are two A parts when one of them should be B (for this error we can assume they were taken in the order of "A" before "B" so the time of data collection applies).
I am new to using queries and honestly struggling hard on this one. Please help me not only solve this problem but also understand it.
Here are text versions of the data:
Apr
Date | Time | Letter | Data |
---|---|---|---|
4/1/2024 | 7:25:08 AM | A | 0.7 |
4/1/2024 | 7:30:56 AM | B | 0.5 |
4/1/2024 | 8:32:51 PM | A | 0.6 |
4/1/2024 | 8:36:44 PM | B | 0.5 |
4/2/2024 | 1:32:59 PM | A | 1 |
4/2/2024 | 1:38:36 PM | A | 0.5 |
4/2/2024 | 8:46:11 PM | A | 0.7 |
4/2/2024 | 8:51:31 PM | B | 0.7 |
May
Date | Time | Letter | Data |
---|---|---|---|
5/1/2024 | 1:35:12 PM | A | 0.6 |
5/1/2024 | 1:39:05 PM | B | 0.4 |
5/1/2024 | 6:07:11 PM | A | 0.8 |
5/1/2024 | 6:10:43 PM | B | 0.5 |
5/2/2024 | 10:59:32 AM | A | 0.8 |
5/2/2024 | 8:42:16 AM | B | 0.1 |
5/2/2024 | 6:15:07 PM | A | 0.4 |
5/2/2024 | 6:18:40 PM | B | 0.2 |
YTD (Current output)
Date | Time | Letter | Data |
---|---|---|---|
4/1/2024 | 7:25:08 AM | A | 0.7 |
4/1/2024 | 7:30:56 AM | B | 0.5 |
4/1/2024 | 8:32:51 PM | A | 0.6 |
4/1/2024 | 8:36:44 PM | B | 0.5 |
4/2/2024 | 1:32:59 PM | A | 1 |
4/2/2024 | 1:38:36 PM | A | 0.5 |
4/2/2024 | 8:46:11 PM | A | 0.7 |
4/2/2024 | 8:51:31 PM | B | 0.7 |
5/1/2024 | 1:35:12 PM | A | 0.6 |
5/1/2024 | 1:39:05 PM | B | 0.4 |
5/1/2024 | 6:07:11 PM | A | 0.8 |
5/1/2024 | 6:10:43 PM | B | 0.5 |
5/2/2024 | 8:42:16 AM | B | 0.1 |
5/2/2024 | 10:59:32 AM | A | 0.8 |
5/2/2024 | 6:15:07 PM | A | 0.4 |
5/2/2024 | 6:18:40 PM | B | 0.2 |
YTD (Desired output)
Date | Time | Letter | Data |
---|---|---|---|
4/1/2024 | 7:25:08 AM | A | 0.7 |
4/1/2024 | 7:30:56 AM | B | 0.5 |
4/1/2024 | 8:32:51 PM | A | 0.6 |
4/1/2024 | 8:36:44 PM | B | 0.5 |
4/2/2024 | 1:32:59 PM | A | 1 |
4/2/2024 | 1:38:36 PM | B | 0.5 |
4/2/2024 | 8:46:11 PM | A | 0.7 |
4/2/2024 | 8:51:31 PM | B | 0.7 |
5/1/2024 | 1:35:12 PM | A | 0.6 |
5/1/2024 | 1:39:05 PM | B | 0.4 |
5/1/2024 | 6:07:11 PM | A | 0.8 |
5/1/2024 | 6:10:43 PM | B | 0.5 |
5/2/2024 | 10:59:32 AM | A | 0.8 |
5/2/2024 | 8:42:16 AM | B | 0.1 |
5/2/2024 | 6:15:07 PM | A | 0.4 |
5/2/2024 | 6:18:40 PM | B | 0.2 |
I used a simple power query as seen here and additionally, in order, changed all the data types to the correct type, sorted by date, sorted by time, removed source name, and removed duplicates.
The reason I cannot rely on file names to sort the data and keep it as it is within the file is because I am pulling data from sheets that all have the same name but are in their own respective monthly folders. Folders sort alphabetically so the order of the months would be wrong if I didn't manually sort it.
This code seems to work with your data. Could be modified depending on your actual setup.
Assumes each month's table is read in as a separate query, and the data types are properly set.
Add a blank query
Paste the code below into the Advanced Editor
*Rename that query according to the Code Comments
//Rename: fnFixLetter
(tbl as table)=>
let
//Sort by date and time, ascending
#"Sorted Rows" = Table.Sort(tbl,{{"Date", Order.Ascending}, {"Time", Order.Ascending}}),
//ASSUMPTION: there are NO missing entries, so grouping by pairs after sorting by date and time will always return the relevant rows
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1, Int64.Type),
#"Inserted Integer-Division" = Table.AddColumn(#"Added Index", "Integer-Division", each Number.IntegerDivide([Index], 2), Int64.Type),
//Remove unneeded Index Column
#"Removed Columns" = Table.RemoveColumns(#"Inserted Integer-Division",{"Index"}),
//Group by Integer-Divide (the pairs)
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Integer-Division"}, {
//Do the magic
{"all", (t)=> let
ltrs = t[Letter],
//if the letters are identical then they should {"A","B"}
order = if List.Count(List.Distinct(ltrs)) = 1 then {"A","B"} else ltrs,
//Replace the Letters column with either {"A","B"} or leave what was there
tbl= Table.FromColumns(
Table.ToColumns(
Table.RemoveColumns(t,{"Letter","Integer-Division"}))
& {order},{"Date","Time","Data","Letter"}),
//resort each pair by Letter
reSort = Table.Sort(tbl,{"Letter", Order.Ascending})
in reSort,
type table[Date=nullable date, Time=nullable time,Data=nullable number, Letter=nullable text]
}}),
//Cleanup
#"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"Integer-Division"}),
#"Expanded all" = Table.ExpandTableColumn(#"Removed Columns1", "all", {"Date", "Time", "Data", "Letter"}),
#"Reorder Columns" = Table.ReorderColumns(#"Expanded all", {"Date","Time","Letter","Data"})
in
#"Reorder Columns"
Now, in a new query, combine all of the existing tables
let
//List of all the tables to be combined, in order
tbls = {April, May},
append = List.Accumulate(
tbls,
#table({},{}),
(s,c)=> Table.Combine({s,fnFixLetter(c)})
)
in
append