I want to write a python function that does this efficiently:
The function will take two strings, 'a' and 'b', and attempt to find the longest palindromic string that can be formed such that it is a concatenation of a non-empty substring of 'a' and a non-empty substring of 'b'. If there are multiple valid answers, it will return the lexicographically smallest one. If no such string can be formed, it will return '-1'.
I have an inefficient solution that generates all the substrings of both strings, and then creates all possible concatenations whle tracking the longest which is a valid palindrome:
def is_palindrome(word):
"""Check if a word is a palindrome."""
reversed_word = word[::-1]
return word == reversed_word
def all_substrings_of_word(word):
"""Generate all possible non-empty substrings of a given string."""
substrings = []
for sub_string_length in range(1, len(word) + 1):
for i in range(len(word) - sub_string_length + 1):
new_word = word[i:i + sub_string_length]
substrings.append(new_word)
return substrings
def buildPalindrome(a, b):
"""Attempt to find the longest palindromic string created by concatenating
a substring of `a` with a substring of `b`."""
sub_strings_a = all_substrings_of_word(a)
sub_strings_b = all_substrings_of_word(b)
# Generate all possible concatenations of substrings from `a` and `b`
multiplexed_array = [
word_a + word_b for word_a in sub_strings_a for word_b in sub_strings_b]
# Find the best palindrome (longest, then lexicographically smallest)
best_palindrome = ""
for word in multiplexed_array:
if is_palindrome(word):
if len(word) > len(best_palindrome):
best_palindrome = word
elif len(word) == len(best_palindrome) and word < best_palindrome:
best_palindrome = word
return best_palindrome if best_palindrome else "-1"
print(buildPalindrome("bac", "bac")) # EXPECTED OUTPUT -- aba
print(buildPalindrome("abc", "def")) # EXPECTED OUTPUT -- -1
print(buildPalindrome("jdfh", "fds")) # EXPECTED OUTPUT -- dfhfd
Can I please get an explanation on how this can be improved?
You could take this approach:
Build a trie for all substrings in b
. A suffix tree would be even better, as it is more efficient.
Consider all possible "centers" for potential palindromes in the string a
. So these can be between two consecutive characters (when palindrome has an even size) or on a character (when palindrome has an odd size). For each of these centers do:
Find the largest palindrome p
at that center only considering string a
extend p
to the left as long as the added characters (in the order of being added) are a word in the trie of b
. This is a potential solution. Compare it with the longest palindrome so far to retain the longest.
If it is not possible to extend p
in this way, then shorten p
until a character is so removed that exists in b
. In that case we have a potential solution.
If in the latter case there are no characters in p
that occur in b
, then we have no suitable palindrome at the chosen center.
Then turn the tables and apply the above procedure where a
becomes the reversal of b
, and b
the reversal of a
. This practically means we search for palindrome centers in the original b
.
Here is an implementation of that idea:
# Of the two given strings choose the one that is longest, or if equal, comes first in lexical order
def longest(x, y):
return min((-len(x), x), (-len(y), y))[1]
def buildTrie(s):
trie = {}
for i in range(len(s)):
node = trie
for j in range(i, len(s)):
node = node.setdefault(s[j], {})
return trie
def buildPalindromeTrincot(s1, s2):
palin = ""
# Two passes: one for where the center of a palindrome is in s1, one for the reverse case
for a, b in ((s1, s2), (s2[::-1], s1[::-1])):
# Build a trie for B substrings (could be suffixtree for better performance)
trie = buildTrie(b)
n = len(a)
# Visit all possible centers in A for a potential solution of at least 2 characters
for center in range(2*n-1, 0, -1):
# Get the offsets of the innermost characters that must be equal
# for a palindrome of at least two characters
mid1 = (center - 1)//2
mid2 = (center + 2)//2
# Get largest palindrome at this center in A alone:
# `left` will point to the left-neighboring character to that palindrome
left = next((left for left, right in zip(range(mid1, 0, -1), range(mid2, n))
if a[left] != a[right]),
max(0, mid1 + mid2 - n))
# Must extend the palindrome with a substring from B
node = trie.get(a[left], None)
if node is not None: # We can extend the palindrome using B
for left in range(left-1, -1, -1):
if a[left] not in node:
left += 1
break
node = node[a[left]]
else:
# See if we can drop characters from the palindrome in A
# until we can replace one with the same character from B
left = next((left for left in range(left+1, mid1+1) if a[left] in trie), None)
if left is None:
continue # No solution found here
palin = longest(a[left:mid2] + a[left:mid1+1][::-1], palin)
return palin or "-1"
For input strings of around 40, this implementation runs 100 times faster than the original code you provided. For inputs of size 70, that becomes a factor of 1000 times. For inputs with strings of size 500, this implementation returns an answer in less than a second.
There is still room for improvement, like:
a+b
and "fan" outward, so that a larger palindrome is found sooner. For this you'll need to have both tries built first as you'll toggle between one and the other.But as the above code already brought a dramatic improvement, I didn't pursue any of these improvements.