githubtwitterkeybaselinkedininstagrammedium
Cattail – Clickbait Analyzer

Inspiration

While we were researching for our project idea for Bitcamp, we went through a tons of useless articles and webpages that we might classify as “clickbait”. These formulaic articles were full of repetitive terms and useless information, so we thought we could create a software to tell people if they are clickbait or not, using data sources like published researches and statistics.

What it does

The extension scans the current webpage and analyse the language used in places like the title. We look for the clickbait-like characteristics such as the use of 2nd person in the title and more informal language. After gathering the keyword information, the software analyzes the collected data and provides the user with an insight summary of the article. After reading the report, if users believe that the verdict is inaccurate, they can click the “Inaccurate Data” button which will directly submit a report to the main server. This report includes the user’s current URL, the title of the page, and other non-confidential information. The server will then periodically run software to aggregate and process the inaccuracy reports from users. If necessary, the data can also be manually analysed for fine tuning of the categorisation algorithm.

How we built it

The extension was built using Google Chrome’s framework for extensions. We used HTML and CSS with the Skeleton CSS toolkit to create the interface pop-up for the extension. Javascript with jQuery was used to analyse the webpage.

Challenges we ran into

We found it difficult to classify certain ambiguous strings. For example, some titles were extremely “clickbait-like” but were too hard to classify on the information we collected. In that case, we either had to rework the keyword analysis algorithm, or alter one of the conditions to place the string into the right category. Both of these approaches had pros and cons, such as affecting the accuracy of the outcome of the analysis. We tried to balance the costs and benefits and attempted to respect the published research as much as we possibly can to provide the users with the most accurate responses.

Accomplishments that we’re proud of

During the development process of this project, we were able to learn to communicate in between objects inside the browser. We were also able to successfully collaborate with each other to contribute almost the same amount of effort into this project, and we can manage to get this relatively logically tough project done on time. We were also very proud of the entire system that we came up with – from the client to the developer’s side, with the result coming directly back to the client’s side.

What we learned

We learned that even if the entire big idea sounded relatively simple, implementing them were actually harder than we expected. We learned a lot about how server side operations worked all by ourselves, and though not used that much in the current version of the project, we’re definitely looking forward to using them to make our project work better in the future.

What’s next?

Instead of using an algorithm based solely on keywords and keyword patterns, we may also include emotional analysis, author credibility, and natural language analysis using machine learning to teach the machine how to recognise clickbait rather than having a rather rough-grained approach of keyword search. Because of this, we are very interested in using IBM’s cognitive technology such as Tone Analyser and Personality Insights.

Data / List Sources

https://blog.bufferapp.com/the-most-popular-words-in-most-viral-headlines

http://minimaxir.com/2015/01/linkbait/

http://www.siegemedia.com/seo/the-most-common-words-in-high-ranking-title-tags