Python coding: Radix Sorting (Movie buddy recommendation)

Spread the love

Task : Sorting recommendation

Tom has written a script to obtain a file called favoriteMovies.txt which contains favorite movies for each user. The favorite movies for each user are not listed in any particular order. He wants to introduce a movie buddy recommendation feature that sends people notification about other people who like exactly the same set of movies. He wants your help in writing an algorithm that generates groups of people who like exactly the same set of movies. Below are the details.

Input

The input file named favoriteMovies.txt records, for each user, a list of its favorite movies.

Below are the first three lines from the file.

1: UNFRIENDED, THE SECOND BEST EXOTIC MARIGOLD HOTEL,THE WEDDING RINGER,GET HARD,ALVIN AND THE CHIPMUNKS THE ROAD CHIP

2: MINIONS,UNFRIENDED,CRIMSON PEAK,SICARIO,SPY

3: PAPER TOWNS,THE PERFECT GUY,ENTOURAGE,CREED,INSIDE OUT,A WALK IN THE WOODS, THE AGE OF ADALINE

Each user ID and its list of movies is separated by a colon :. In each list, movie names are separated by a comma (,). You can safely assume that the movie names consist only of letters from English alphabet in upper case. In the above example, user 1 likes two movies: Fantastic Four; The Second Best Exotic Marigold Hotel. The user 4 likes movies named Minions, Unfriended, Crimson Peak, Sicario and Spy.

Output

The example of output would be printing groups of users (of size at least 2) such that all users in a group like exactly the same set of movies. Below is a sample output for the above file.

GROUP 1

Movies: JURASSIC WORLD,A WALK IN THE WOODS

Buddies: 5921,8894

GROUP 2

Movies: THE SECOND BEST EXOTIC MARIGOLD HOTEL,THE VISIT,NO ESCAPE,SISTERS,

MINIONS,PADDINGTON,BLACK MASS

Buddies: 2819,6836,8294,8573

GROUP 3

Movies: WAR ROOM,TRAINWRECK,CHAPPIE,STRAIGHT OUTTA COMPTON

Buddies: 1672,2007,6710

def countSort(word):
    '''
    Sorts words in alphabetical order 
    
    Input: word(only alphabetical characters) in a list 
    Output: Sorted word alphabetically 
    Time complexity: O(c), where c is the number of characters in a word
    Space complexity: O(1)
    '''
    word = word.lower()
    countletter = [0]*26
    sortedWord = ''
    for i in word:
        countletter[ord(i)-97]+= 1
    for i in range(len(countletter)):
        sortedWord += chr(i+97)*countletter[i]
    return sortedWord


def sortedletter(data):
    '''
    Sorts words by letters for each word in a list 
    
    Input: Words in a list 
    Output: Sorted letter 
    Time complexity: O(cu), where u the total number of users
    Space complexity: O(cu)
    '''
    for word in data:
        sortedWord = countSort(word[2])
        word[2] = sortedWord
    return data



def String_radixSort(data):
    '''
    Sorts the order for list of lists into lexicographical order 
    
    Input: word(sorted letter) in a list
    Output: list of word list in lexicographical order
    Time complexity: O(kcu), where k is maximum number of words liked by a user
    Space complexity: O(kcu)
    '''
    data = sortedletter(data)
    alpha_len = 27
    max_length = 0
    for i in range(len(data)):
        word = len(data[i][2])
        if word > max_length:
            max_length = word    
            
    for pos in range(max_length - 1, -1, -1):
        buckets = [[] for i in range(alpha_len)]
        for i in range(len(data)):
            current_length = len(data[i][2])
            if pos < current_length:
                temp = ord(data[i][2][pos]) - ord("a")    
                buckets[temp].append(data[i])
            else:
                buckets[alpha_len-1].append(data[i]) #when character is just space
        a = 0
        for b in range(alpha_len): #(o(nm)) where m =27 n= size of buck
            buck = buckets[b]
            for i in range(len(buck)):
                data[a] = buck[i]
                a += 1
    return data

def grouping(data):
    """
    Find common movie and group them
    
    Input: the list of data
    Ouput: Print group
    Time complexity: O(mn) where m is number of groups and n is number of users
    Space complexity: O(1)
    """
    d = " "
    data_list = [] 
    i = 0
    j = 1
    k = 1

    while i in range(len(data)): # i = o. m(no.of groups)
        d += str(data[i][0]) + ", "
        data_list.append(data[i][0])
        while j in range(len(data)): # j = o. n(no. of users in a group)
            
            if data[i][2] == data[j][2]: #if not just print as group
                d += str(data[j][0]) + ", " 
                data_list.append(data[j][0])
                j += 1 
            else:
                break
        if len(data_list) > 1:
            print("GROUP " + str(k))
            k+=1
            print("Movies: " + str(data[i][1]))
            print("Buddies: " + d + "\n")
            
        d = ""
        data_list = []
        i = j 
        j = j + 1

            
def openfile():
    file = open("favoriteMovies.txt",'r')
    lines = file.readlines()
    file.close()
    data = []
    for i in lines:
        alist = i.strip().split(':')
        words = alist[1]
        
        words = words.replace(',','')
        words = words.replace(' ','')
        alist.append(words)
        data.append(alist)

    return data
    

if __name__=="__main__":
    x= [[1,["cat","dogs"],"catdogs"],[]]
    f= openfile()
    sortedletterlist = (String_radixSort(f))
    grouping(sortedletterlist)

Algorithm Time complexity

1) openfile() function will read input text file and split the string into an array in this below format. It takes O(U) times where U is the total number of users. Data = [[User ID, Movie name, Raw_Word=Movie name without space and comma]] 

2) The Raw_Word will pass to sortedletter() function to sort each word of length the C where maximum number of characters word. Each Raw_Word will be sorted alphabetically into a sortedWord with countSort(), appending the list into bucket. The worst-case time complexity of this operation is O (CU) where U the total number of users. 

Worst case time complexity: O(CU) 

Worst case space complexity: O(CU) 

3) In the String_radixSort() function, first, it calls the sortedletter() function to sorts words by letter for each word in data list in lexicographical order. This takes K*(CU) where K is the maximum number of movies liked by a user, U times where U is the total number of users and since countSort() is stable which has time and space complexity O(CU). Thus, the worst case time complexity is O(K*(CU)) = O(KU), where C is constant for English alphabets. 

Worst case time complexity: O(KU) 

Worst case space complexity: O(KU) 

4) After the words in the list sorted into lexicographical order, grouping() function will perform to find the same sorted word by comparing each word and print them into the same group if they are same. The function will print groups if there are any common movies more than 2. This operation takes M*N times where M is the number of groups and U is the number of users. Space complexity is O(U) where U is the total number of users. 

Worst case time complexity: O(MU) 

Worst case space complexity: O(U) 


By Python Helper

Author: preservsun

Leave a Reply

Your email address will not be published. Required fields are marked *