lualua-tablededuplication

Remove duplicates from LUA Table by timestamp


I was on stack a few days back for help inserting records to prevent duplicates. However the process to enter these is slow and they slip in.

I have a user base of about 10,000 players, and they have duplicate entries.. I've been trying to filter out these duplicates without success. The examples on stack have no panned out for me.

Here is a clip from my table

    [18] = 
                {
                    ["soldAmount"] = 25,
                    ["buyer"] = [[@playername]],
                    ["timestampz"] = 1398004426,
                    ["secsSinceEvent"] = 55051,
                    ["guildName"] = [[TradingGuild]],
                    ["eventType"] = 15,
                    ["seller"] = [[@myname]],
                },
    [19] = 
                {
                    ["soldAmount"] = 25,
                    ["buyer"] = [[@playername]],
                    ["timestampz"] = 1398004426,
                    ["secsSinceEvent"] = 55051,
                    ["guildName"] = [[TradingGuild]],
                    ["eventType"] = 15,
                    ["seller"] = [[@myname]],
                },

The timestamp's match and they should not have been added.

  for k,v in pairs(sellHistory) do mSavedTHVars.Forever_Sales[k] = v
    if mSavedTHVars.Forever_Sales.timestampz ~= sellHistory.timestampz then
      table.insert(mSavedTHVars.Forever_Sales, sellHistory)
    end end

Now, I need to find out how to remove the current duplicates, and here is what I've tried.

function table_unique(tt)
  local newtable = {}
  for ii,xx in ipairs(tt) do
    if table_count(newtable.timestampz, xx) ~= tt.timestampz then
      newtable[#newtable+1] = xx
    end
  end
  return newtable
end

I hope this information provided was clean and understandable.

Thanks!

UPDATE

Attempt #20 ;)

  for k,v in pairs(mSavedTHVars.Forever_Sales) do
    if v == mSavedTHVars.Forever_Sales.timestampz then
      table.remove(mSavedTHVars.Forever_Sales,k)
    end
  end

No luck yet.

UPDATE

This has worked

  for k,v in pairs(mSavedTHVars.Forever_Sales) do mSavedTHVars.Forever_Sales[k] = v
    if v.timestampz == mSavedTHVars.Forever_Sales.timestampz then
      table.remove(mSavedTHVars.Forever_Sales, k)
    end
  end

IS this a good approach?


Solution

  • Assuming that mSavedTHVars.Forever_Sales[18] and mSavedTHVars.Forever_Sales[19] are the tables you listed in your post, then to remove all duplicates based on same time stamp it is easiest to create a "set" based on timestamp (since the timestamp is your condition for uniqueness). Loop through your mSavedTHVars.Forever_Sales and for each item, add item to new table only if its timestamp not already in set:

    function removeDuplicates(tbl)
        local timestamps = {}
        local newTable = {}
        for index, record in ipairs(tbl) do
            if timestamps[record.timestampz] == nil then
                timestamps[record.timestampz] = 1
                table.insert(newTable, record)
            end
        end
        return newTable
    end
    
    mSavedTHVars.Forever_Sales = removeDuplicates(mSavedTHVars.Forever_Sales)
    

    Update based on Question Update:

    My comment on following proposed solution:

    for k,v in pairs(mSavedTHVars.Forever_Sales) do 
      mSavedTHVars.Forever_Sales[k] = v
      if v.timestampz == mSavedTHVars.Forever_Sales.timestampz then
        table.remove(mSavedTHVars.Forever_Sales, k)
      end
    end
    

    The problem is that I don't see how that can work. When you do for k,v in pairs(mSavedTHVars.Forever_Sales) do then v is mSavedTHVars.Forever_Sales[k] so the next line mSavedTHVars.Forever_Sales[k] = v does nothing. Then if v.timestampz == mSavedTHVars.Forever_Sales.timestampz compares the timestamp of v, i.e. of mSavedTHVars.Forever_Sales[k], with value of a timestampz field in mSavedTHVars.Forever_Sales. But latter is a table without such field, so right-hand-side of == will be nil, so the condition will only be true if v.timestampz is nil, which I don't think is ever the case.

    The main reason that I used a solution of creating new table instead of removing duplicates from the existing table is that you can edit a table while iterating over it with pairs or ipairs. If you were to use a reverse counter, it would probably be ok (but I have not tested, test to be sure):

    function removeDuplicates(tbl)
        local timestamps = {}
        local numItems = #tbl
        for index=numItems, 1, -1, do
            local record = tbl[index]
            if timestamps[record.timestampz] ~= nil then
                table.remove(newTable, index)
            end
            timestamps[record.timestampz] = 1
        end
    end
    

    Also I think the intent of the function is not as clear, but maybe this is just personal preference.