Assigning codes to nodes such that codes are unique and create two groups which share common properties

We are given N, and we know there are 1..N objects in group 1 and N+1...2*N objects in group 2. Every object in group 1 is connected with each other, and every object in group 2 is connected with each other (inside group).

There is a list consisting of length p (1 <= p <= N^2) of numbers [a, b], where a is object from group 1 and b is object from group 2. The pair [a, b] means that there must be connection between object a and b. If there is no [a, b] pair in the list, then there cannot be connection between them.

For every object, we need to set unique code representing this object, which is string constisting of characters {A, B, C} with length up to M and N + 1 <= M <= 3N. Given two codes x and y of equal length, if there is index i such that x_i == y_i, then there is connection between object represented by code x and y.

I tried solving it starting with setting characters in codes that will create required connections between objects of two distinct groups.

#include <iostream>
#include <vector>
#include <set>

using namespace std;

int main()
{
    ios_base::sync_with_stdio(false);
    cin.tie(nullptr);
    cout.tie(nullptr);

    int n, p, M;
    cin >> n >> p >> M;

    set<int> groups;
    set<pair<int, int>> connections;
    for (int i = 0; i < p; ++i)
    {
        int a, b;
        cin >> a >> b;

        groups.insert(b);
        connections.insert({a, b});
    }

    vector<string> codes(2 * n + 1, string(M, ' '));
    int pos = 1;
    set<int> a_pos;
    for (const int b : groups)
    {
        for (int a = 1; a <= n; ++a)
        {
            if (connections.find({a, b}) == connections.end())
                continue;

            codes[a][pos] = 'A';
            codes[b][pos] = 'A';
        }

        ++pos;
        a_pos.insert(pos);
    }
}

To use as minimum characters as possible to create required connections, I grouped connections basing on their endpoints (objects from group 2). This way, assume N = 3, if we have connections e.g. 1 -> 4, 2 -> 4, we can use one character at some position to connect both 1 and 2 to 4. Also, I assumed my codes will always be of maximum possible length.

However, I have problem to come up with logic that will ensure uniqueness of every code. We can see that if we imagine the connections between objects as a graph, then objects become nodes, there are two main components (group 1 and group 2), and then we have connections between these components, so it's a bit similar to a bipartite graph, but this observation didn't help me.

How can we solve that?

That's problem Satelity from XXXI Polish Olympiad in Informatics, stage I

Solution

It's tricky, but you can always do this with codes of length N+1 or less.

First, partition each group into equivalence classes, where two objects are considered equivalent if they connect to the same set of other objects.

In each group, we count the number of equivalence classes that have connections to the other group. The group with the smallest number will be designated the "target" group T, and the other group will be designated S.

Let E be the number of equivalence classes with connections in T. We assign the first E code symbols as follows:

Each symbol position corresponds to an equivalence class with connections in T
For each object in group T, the code has B in the position for its equivalence class (if it has connections), and A elsewhere.
For each object in group S, the code has B for all the equivalence classes in T that it connects to, and C elsewhere.

Using E symbol positions, we have now accurately represented all of the connections between groups. We need to use the remaining positions to connect all T objects to each other, connect all S objects to each other, and to distinguish objects in the same equivalence class, because they have the same codes so far.

Let C be the size of the largest equivalence class. We always have E <= N+1-C, and so we have at least C remaining symbol positions.

Easy case

If C=1 (each object has distinct connections), then we only need to use one symbol position to join each group. Set it to A for group T and B for group S.

Still easy case

If E <= N+1 - 2ceil(log₂ C), then we have at least 2ceil(log₂ C) positions available.

We allocate ceil(log₂ C) positions for each group to record a unique index i to distinguish objects in the same equivalence class. Within the group, its positions encode the index as a binary number using B and C, and set all the other group's positions to A. This ensures that each group is connected within itself by the As, and not mistakenly connected to the other group, because the As won't match B or C.

Note that this easy case almost always applies. The only times it doesn't apply are special cases with C=3 or C=5.

Hard cases C=3

If C=3 and the case is non-easy, then both groups must have N-2 equivalence classes, all with connections, leaving 3 positions to record the remaining information. If only one of the groups has a class of size 3 (let's say it's S), then encode the remaining positions like this (similar to the easy cases):

 Group  i  p0 p1 p2
--------------------
   S    0   A  B  B
   S    1   A  B  C
   S    2   A  C  B
   T    0   B  A  A
   T    1   C  A  A

Otherwise, both groups have exactly one class of size 3, and all the rest are size 1. If the two big classes are connected, then encode the remaining positions like this:

 Group  i  p0 p1 p2
--------------------
   S    0   A  A  A
   S    1   A  A  C
   S    2   A  C  A
   T    0   B  B  B
   T    1   B  B  C
   T    2   B  C  B

The only case left is when the big classes are disconnected. Find one T class that the big S class connects to, and give it an extra position. Now there are 3 distinct ways to connect to that T class, and they can be used to distinguish the 3 members of the S class. The remaining positions are encoded like so to distinguish the members of the T class:

 Group  i  p0 p1 
------------------
   T    0   B  B 
   T    1   B  C 
   T    2   C  B 
   S    x   A  A

Note that even though these assignments don't connect the two object in group T with i = 1 or 2, they will still be connected, because they connect to all the same objects in S.

Hard Cases C=5

If C=5 and the case is non-easy, then both groups must have N-4 equivalence classes, all with connections, leaving 5 positions to record the remaining information. If only one of the groups has a class of size 5 (let's say it's S), then encode the remaining positions like this:

 Group  i  p0 p1 p2 p3 p4
--------------------------
   S    0   B  B  B  A  A
   S    1   B  B  C  A  A
   S    2   B  C  B  A  A
   S    3   B  C  C  A  A
   S    4   C  B  B  A  A
   T    0   A  A  A  B  B
   T    1   A  A  A  B  C
   T    2   A  A  A  C  B
   T    3   A  A  A  C  C

Otherwise, both groups have exactly one class of size 5, and the rest are size 1. If the two big classes are connected, then assign the remaining positions like this:

 Group  i  p0 p1 p2 p3 p4
--------------------------
   S    0   A  A  A  A  A
   S    1   A  A  A  A  C
   S    2   A  A  A  C  A
   S    3   A  A  A  C  C
   S    4   A  A  C  A  A
   T    0   B  B  B  B  B
   T    1   B  B  B  B  C
   T    2   B  B  B  C  B
   T    3   B  B  B  C  C
   T    4   B  B  C  B  B

Finally, in the last hard case, both groups have exactly one class of size 5, and the two big classes are not connected. In this case, find one T class that the big S class connects to, and give it 2 extra positions. Now there are 7 distinct ways to connect to that T class, and they can be used to distinguish the 5 members of the S class. The remaining 3 positions are encoded like so to distinguish the members of the T class:

 Group  i  p0 p1 p2
--------------------
   T    0   B  B  B
   T    1   B  B  C
   T    2   B  C  B
   T    3   B  C  C
   T    4   C  B  B
   S    x   A  A  A

Conclusion

That completes the non-easy cases, all of them covered with N+1 symbols.