Verified Twitter Usernames

How to get a list of all verified accounts on twitter !?

Dipansh Khandelwal
5 min readDec 31, 2020

A year ago, I faced an interesting problem where I had to get the usernames of all the verified accounts from twitter.
How would you go about it?

Source : https://www.authormedia.com/how-to-set-up-a-twitter-profile/

Following is my journey of how I got to the final solution.

TLDR: There is no single/correct approach to solve this.

Brute Force 😂 🤷‍♂

The worst possible approach would be to iterate over all the possible usernames from length 1 to max length of a username that twitter allows, hit the twitter API for each, and aggregate all that were marked as verified.

Think !!
We could have even opened each username’s profile and checked it manually. Though this would take our whole life, but still …🤔 😂.

But there is a catch, retrieving user details using twitter’s API has a rate limit of 900 requests/15 minute window.

An unknown number of total users present on twitter, a Space Complexity of probably infinity , and Time Complexity of O(n) of n that is unknown are only a few of the issues with this brute force method has. There was no point bothering to even write a script for this.

Check out the Twitter API rate limits here: https://developer.twitter.com/en/docs/twitter-api/v1/rate-limits

At this point, I needed some more data to measure the size of the problem. Also someplace from where I can somehow get only verified usernames ( fetching them was a different ballgame altogether ).

twitter@verified

This was a very important breakthrough.
After some research, I found that there is a twitter account by the name of twitter@verified, which followed all the verified twitter accounts.

https://twitter.com/verified

Looking at the details of that account, I found out that the total number of verified accounts on twitter was 361.7K.

Now that I knew the size of my problem, it looked like I could easily get a list of users this account follows, and voila! I thought I would just scrape the website to get that list and so I started writing a script right away.

I had no clue about the mammoth size of the problems that were waiting for me on the next page.

Scraping twitter web 🕸 💻

This was my first attempt on web scraping, I directly used beautiful soup for this. Honestly, I did not think it through and was just rushing, which was followed by a series of unpleasant surprises.

Beautiful Soup

About beautiful soup …
https://pypi.org/project/beautifulsoup4/

Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree.

Initially, when I executed the script, I got some usernames but noticed that I was only able to fetch the first page, and needed to go to the next page for more names, viz-à-viz pagination.

But later I realized that twitter is not paginated, rather lazy-loaded. So, in order to get the next set of usernames I needed to scroll the page, which sadly was not possible with beautiful soup !! 😩

At this point, I had only 60–70 usernames aggregated.

Some research later I concluded that the next step was an introduction to selenium. I thought it would solve the problem and went onto modify the script.

About Selenium

https://pypi.org/project/selenium/
Selenium is an open-source web-based automation tool.

PS: Selenium also allowed me to easily perform Authentication on twitter, which could otherwise have been a problem for running a script of this kind.

After running the script, my expectation was that my problem was solved, but my computer started facing multiple out of memory crashes and no amount of optimizations brought me any closer to the end game.

At this point, I had gathered around 2000 verified usernames.

Some further research later I stumbled upon selenium’s headless mode.

About Selenium headless mode

Running selenium in headless mode does not open a GUI (Graphical User Interface), which saves a lot of local memory.

Why so difficult !!

Multiple attempts later, the results got a little better but not good enough to solve the problem.

At this point, I had gathered around 4000-5000 verified usernames.

Selenium Script

A version of that script till this point can be found in the following gist.
https://gist.github.com/DipanshKhandelwal/f7acfded1b547fbd76c2b7d7810e6dd9

The taste of failure had slowly started to make me give up. It seemed to me that this problem could only be solved by paying for twitter’s API, but as my engineering undergrad instinct had taught me, don’t pay for anything, unless you absolutely have to.

The final solution 🎊

Another one of my engineering instincts had taught me that if you can’t solve a problem alone you can definitely try it with your friends.

So I collected some twitter API keys from my friends and waited for the long-running script to fulfill its destiny!
The following is a brief description of the final solution.

Each unique key allowed me to get around 60 usernames in each hit after which consecutive hits using that key returned a limit exceeded exception for a window of 15 minutes.
To solve this I created an iterative system where each key got expired for a window of 15 minutes after returning its initial set of 60 usernames, and similarly this followed suit for the next key, adding an extra timeout of 15 mins after the execution of the last key.

Obviously, a script of this magnitude would be handicapping for my local system, hence I used an ec2 instance to run it, the execution of which got completed roughly in a day.

Finally, I had aggregated all ~360.7K verified twitter usernames.

We did it !!

Congratulations!! You made it till the end.

You can find the final script here:

https://gist.github.com/DipanshKhandelwal/92e3d51531e3e01a14fa51bd50eec6ff

Hit me up if you have any doubts, I’ll be glad to help someone in need.
Please let me know your thoughts on it, and also other solutions that you can think of 😃 !!

Thank you for reading 😃

Hey! I am Dipansh Khandelwal, a Computer Science Engineer and a Full Stack Developer. I have been programming for around 4 years now, working remotely most of the time, with a wide tech stack including native Android and iOS, React Native, React, Firebase, Django, Express and more. I like to listen to Audiobooks and play Badminton.

You can connect with me on LinkedIn@dipanshkhandelwal and check out some more work at GitHub@DipanshKhandelwal.

--

--