by Taoli Shen
mentor: Xiaozhe Hu, Mathematics; funding source: NSF TRIPODSTaoli-Shen_Poster
You may think the title of my project is a bit intimidating — HodgeRank?? Certainly, we can go deep into the math discussion here, but the idea is very simple: how to rank up items. The summer research I conducted with Professor Xiaozhe Hu aims to explore more efficient ways to generate an unprejudiced ranking for a list of objects online. “HodgeRank” is simply a theorem that is the very theoretical foundation of my research, and you can find a more detailed explanation in my poster.
Here a natural question may arise. What difference does it make between rankings online and those that are not? The distinction is that online rankings are constantly changing. Take Netflix for example. A whopping 165 million hours of Netflix is being watched every day across the continents. Hence imagine the enormous amount of data flooding into the database that is influential of each show’s ranking on Netflix.
The greater the volume of the data is, the more issues are likely to appear. A common challenge is the inconsistency of the data. Namely, some people watched certain movies while others don’t, which may lead to the following dilemma. A Hollywood blockbuster has been watched by 12 million people but only gets a 5.6 out of 10. Meanwhile, an independent film with only 20 ratings shows a record-high 9.7 out of 10. Which one should be ranked higher? So here comes a question of how to take into account all the information to result in a fair ranking.
Another challenge that appears more obvious is the speed of computation. We need to come up with ways that can quickly transform those fresh data into a newly updated ranking because viewers would like to see those daily changes. No one wants to wait for 20 days to see if their favorite show finally makes it into the top 10 list. One way to update the online ranking is simply to calculate the ranking all over again using all the available data. However, this can be extremely slow for a large database. To tackle the above challenges and to make the ranking come out as fast and as accurate as possible, we develop ways to update the ranking based on the existing rank. This can be very efficient since we only need to calculate a tiny portion compared to the existing info. Say we have 2000 viewers rated movies today, but there are 20 million ratings already, which is 1/1000000 the original work! However, how to connect the existing rank to our new rank can be subtle, and please refer to my poster for a comprehensive look.