Task : Sorting recommendation
Tom has written a script to obtain a file called favoriteMovies.txt which contains favorite movies for each user. The favorite movies for each user are not listed in any particular order. He wants to introduce a movie buddy recommendation feature that sends people notification about other people who like exactly the same set of movies. He wants your help in writing an algorithm that generates groups of people who like exactly the same set of movies. Below are the details.
Input
The input file named favoriteMovies.txt records, for each user, a list of its favorite movies.
Below are the first three lines from the file.
1: UNFRIENDED, THE SECOND BEST EXOTIC MARIGOLD HOTEL,THE WEDDING RINGER,GET HARD,ALVIN AND THE CHIPMUNKS THE ROAD CHIP
2: MINIONS,UNFRIENDED,CRIMSON PEAK,SICARIO,SPY
3: PAPER TOWNS,THE PERFECT GUY,ENTOURAGE,CREED,INSIDE OUT,A WALK IN THE WOODS, THE AGE OF ADALINE
Each user ID and its list of movies is separated by a colon :. In each list, movie names are separated by a comma (,). You can safely assume that the movie names consist only of letters from English alphabet in upper case. In the above example, user 1 likes two movies: Fantastic Four; The Second Best Exotic Marigold Hotel. The user 4 likes movies named Minions, Unfriended, Crimson Peak, Sicario and Spy.
Output
The example of output would be printing groups of users (of size at least 2) such that all users in a group like exactly the same set of movies. Below is a sample output for the above file.
GROUP 1
Movies: JURASSIC WORLD,A WALK IN THE WOODS
Buddies: 5921,8894
GROUP 2
Movies: THE SECOND BEST EXOTIC MARIGOLD HOTEL,THE VISIT,NO ESCAPE,SISTERS,
MINIONS,PADDINGTON,BLACK MASS
Buddies: 2819,6836,8294,8573
GROUP 3
Movies: WAR ROOM,TRAINWRECK,CHAPPIE,STRAIGHT OUTTA COMPTON
Buddies: 1672,2007,6710
def countSort(word): ''' Sorts words in alphabetical order Input: word(only alphabetical characters) in a list Output: Sorted word alphabetically Time complexity: O(c), where c is the number of characters in a word Space complexity: O(1) ''' word = word.lower() countletter = [0]*26 sortedWord = '' for i in word: countletter[ord(i)-97]+= 1 for i in range(len(countletter)): sortedWord += chr(i+97)*countletter[i] return sortedWord def sortedletter(data): ''' Sorts words by letters for each word in a list Input: Words in a list Output: Sorted letter Time complexity: O(cu), where u the total number of users Space complexity: O(cu) ''' for word in data: sortedWord = countSort(word[2]) word[2] = sortedWord return data def String_radixSort(data): ''' Sorts the order for list of lists into lexicographical order Input: word(sorted letter) in a list Output: list of word list in lexicographical order Time complexity: O(kcu), where k is maximum number of words liked by a user Space complexity: O(kcu) ''' data = sortedletter(data) alpha_len = 27 max_length = 0 for i in range(len(data)): word = len(data[i][2]) if word > max_length: max_length = word for pos in range(max_length - 1, -1, -1): buckets = [[] for i in range(alpha_len)] for i in range(len(data)): current_length = len(data[i][2]) if pos < current_length: temp = ord(data[i][2][pos]) - ord("a") buckets[temp].append(data[i]) else: buckets[alpha_len-1].append(data[i]) #when character is just space a = 0 for b in range(alpha_len): #(o(nm)) where m =27 n= size of buck buck = buckets[b] for i in range(len(buck)): data[a] = buck[i] a += 1 return data def grouping(data): """ Find common movie and group them Input: the list of data Ouput: Print group Time complexity: O(mn) where m is number of groups and n is number of users Space complexity: O(1) """ d = " " data_list = [] i = 0 j = 1 k = 1 while i in range(len(data)): # i = o. m(no.of groups) d += str(data[i][0]) + ", " data_list.append(data[i][0]) while j in range(len(data)): # j = o. n(no. of users in a group) if data[i][2] == data[j][2]: #if not just print as group d += str(data[j][0]) + ", " data_list.append(data[j][0]) j += 1 else: break if len(data_list) > 1: print("GROUP " + str(k)) k+=1 print("Movies: " + str(data[i][1])) print("Buddies: " + d + "\n") d = "" data_list = [] i = j j = j + 1 def openfile(): file = open("favoriteMovies.txt",'r') lines = file.readlines() file.close() data = [] for i in lines: alist = i.strip().split(':') words = alist[1] words = words.replace(',','') words = words.replace(' ','') alist.append(words) data.append(alist) return data if __name__=="__main__": x= [[1,["cat","dogs"],"catdogs"],[]] f= openfile() sortedletterlist = (String_radixSort(f)) grouping(sortedletterlist)
Algorithm Time complexity
1) openfile() function will read input text file and split the string into an array in this below format. It takes O(U) times where U is the total number of users. Data = [[User ID, Movie name, Raw_Word=Movie name without space and comma]]
2) The Raw_Word will pass to sortedletter() function to sort each word of length the C where maximum number of characters word. Each Raw_Word will be sorted alphabetically into a sortedWord with countSort(), appending the list into bucket. The worst-case time complexity of this operation is O (CU) where U the total number of users.
Worst case time complexity: O(CU)
Worst case space complexity: O(CU)
3) In the String_radixSort() function, first, it calls the sortedletter() function to sorts words by letter for each word in data list in lexicographical order. This takes K*(CU) where K is the maximum number of movies liked by a user, U times where U is the total number of users and since countSort() is stable which has time and space complexity O(CU). Thus, the worst case time complexity is O(K*(CU)) = O(KU), where C is constant for English alphabets.
Worst case time complexity: O(KU)
Worst case space complexity: O(KU)
4) After the words in the list sorted into lexicographical order, grouping() function will perform to find the same sorted word by comparing each word and print them into the same group if they are same. The function will print groups if there are any common movies more than 2. This operation takes M*N times where M is the number of groups and U is the number of users. Space complexity is O(U) where U is the total number of users.
Worst case time complexity: O(MU)
Worst case space complexity: O(U)