Solution
-
Uses Topological Sort
-
Create a graph from the given rules
-
If there's a cycle in this graph, return an empty string because a valid answer is not possible
- See the example below: does w come before e, or does e come before w? We don't know. We cannot know.
-
The first part of the code is just bookeeping, checking for boundary conditions, and setting up the adjacency list
adj = defaultdict(set)
for i in range(len(words) - 1):
w1, w2 = words[i], words[i + 1]
min_len = min(len(w1), len(w2))
if len(w1) > len(w2) and w1[:min_len] == w2[:min_len]:
return ''
for j in range(min_len):
if w1[j] != w2[j]:
adj[w1[j]].add(w2[j])
break
- Loop though each word pair
- If two pairs have the same suffix but the longer word comes before the shorter word, immediately return because this is not lexicographically possible.
- We know that the words are in sorted order already
- If there are two words "app" and "apple" but apple comes before app, we immediately return an empty string
- We maintain a adjacency list where the key is the character and the value is a set of characters that come AFTER the key character
- Loop through each character in both the words, when a character that doesn't match is seen, we add the character from the second word (remember, the words are sorted) to the the set
The next part of the code is the graph traversal
def dfs(node):
if node in visiting:
return False
if node in visited:
return True
visiting.add(node)
for nbr in adj[node]:
if not dfs(nbr):
return False
visited.add(node)
path.append(node)
visiting.remove(node)
return True
for ch in set(''.join(words)):
if not dfs(ch):
return ''
return ''.join(path[::-1])
- We do something called post-order DFS, where instead of adding a character immediately to the output variable, we add it once we finished dealing with all of it's neighbours.
- Why are we doing post-order DFS? Because if we do a regular DFS, we have a chance to get an invalid ordering:
- Here, ABC would be the correct order, not ACB
- We will get ACB if we do regular DFS, i.e. add the value to path immediately once visited
- We will get ABC (the correct ordering) if we do post-order DFS, i.e. add the value to the path only after we deal with all it's neighbours
- We need to maintain two sets called visited and visiting
visited is to recur out of the DFS if a character we've seen already is seen, we've already dealt with this character before and hence no need to process it againvisiting is to check for cycles in the graph. If we visit a node and then visit it again, return False because if a loop exists, we cannot figure out the ordering of characters- Do the Depth-first Search on each character, and if at any point we return False, return an empty string
- Else, return the path, but remember to reverse it since we did a post-order DFS.
- Why do we need to do it for each character:
- The graph is not guaranteed to be a single component:
- Given example
words = ["zy", "zx"], the only graph would be:
- How can z come into the picture?
- Only be looping through each character can z come here
- Because from the given input, it is immediately clear that y comes before x.
- But where does z come from? It has to come after both x and y because there's no other ordering mentioned and the only other place it can come is after both x and y (I think)
| Time | Space | Explanation |
|---|
O() | O() | |
def alienOrder(self, words: List[str]) -> str:
adj = defaultdict(set)
for i in range(len(words) - 1):
w1, w2 = words[i], words[i + 1]
min_len = min(len(w1), len(w2))
if len(w1) > len(w2) and w1[:min_len] == w2[:min_len]:
return ''
for j in range(min_len):
if w1[j] != w2[j]:
adj[w1[j]].add(w2[j])
break
visited, visiting, path = set(), set(), []
def dfs(node):
if node in visiting:
return False
if node in visited:
return True
visiting.add(node)
for nbr in adj[node]:
if not dfs(nbr):
return False
visited.add(node)
path.append(node)
visiting.remove(node)
return True
for ch in set(''.join(words)):
if not dfs(ch):
return ''
return ''.join(path[::-1])