When the data of more than 500 million Facebook users surfaced earlier this year, the company was quick to announce that this was not a security breach. Indeed, it wasn’t. All the data was “scraped” from the website without the permission of the social media giant. Facebook might not get away as easily as they think. They could’ve prevented this scraping. However, the incident should give all of us something to think about.

Attackers don’t need to illegally access a company’s secured systems these days to find out about someone. We are very generous when it comes to sharing our data. Even privacy regulations like the GDPR or CCPA cannot protect our data if we give them away voluntarily. That makes it easy to target individuals, or use the data to initiate a phishing attack on a company or organisation. All an attacker needs is a little open-source intelligence (OSINT) – and the tools for web scraping.

While most will agree that unregulated data harvesting could be used for malicious purposes, many benefits of the ethical usage of data scraping cannot be overlooked. Digital businesses use web scraping tools to monitor user habits and purchase history to create a personalized experience for users. Search engines use it to deliver relevant search results. It has also led to huge advancements in the areas of machine learning and artificial intelligence.

Unfortunately, there is no uniform and internationally accepted laws and regulations that deal with web scraping. While the discussion about regulations continues, the industry itself focuses on self-regulation and the education of internet users. We spoke with Karolis Toleikis, CEO of IPRoyal, to explain the basics of web scraping.

Cyber Protection Magazine: What is web scraping?

Karolis Toleikis: Web scraping, also known as web harvesting or crawling is the extraction of data from websites and converting it into a structured format (like spreadsheets) for the user. While manual web scraping is possible, automated tools are preferred because they save time.

Cyber Protection Magazine: Isn’t web scraping against the idea of privacy? After all, I only gave my data to a particular provider, not those who are scraping it.

Karolis Toleikis: When a user publishes something publicly it is fair game. The automated tools just make it easier.

Cyber Protection Magazine: How can attackers utilize the data they scraped (e.g., phishing)?

Karolis Toleikis: Email addresses, birthdays, addresses, phone numbers, lists of friends. Any publicly available information. The gathering is not illegal. With that they can target individuals or businesses with scam emails to get them to reveal more personal information. Those follow-up activities and how the gathered data is used is where the crime is.

Cyber Protection Magazine: How can individuals protect against scraping?

Karolis Toleikis: There are many things website owners and administrators can do to keep your data safe. For one thing, don’t post your full birthday on Facebook. It’s fun to get birthday wishes, but your birthday verifies your identity.. The best you can do as an individual is to make sure you don’t share any private data with websites that don’t guarantee it will stay private. Also, don’t post any personal information in publicly available places. That’s the only way to make sure your data stays away from malicious actions. (Editor’s note: yesterday a venture capitalist asked us why he was getting notifications from a social media site about the problems he was having accessing his account. He wasn’t, but someone was trying to guess his password to access his social media contacts and impersonate him. We recommended activating multi-factor authentication immediately.)

Cyber Protection Magazine: How can companies protect their customer or company data from scraping?

Karolis Toleikis: If a company requires a login to access any sensitive data, scraping becomes nearly impossible. Using CAPTCHAs successfully separates human visitors from bots, so it’s another great way to prevent data gathering. Finally, regularly updating the website’s HTML code and keeping a close eye on user accounts with abnormal activity patterns is another great way to make things harder for anyone trying to harvest data.

One thought on “How hackers are building data fountains through webscraping

  • June 29, 2021 at 2:54 am
    Permalink

    Thank you for this article, it only reinforced my want to get into the cyber security field , but it left me still hungry for knowledge on the issue, if possible maybe someone can help point me in the rite direction to acquire a career in that field I would be very great full , see I am learning web craft to have a better job , less work well I enjoy it so it’s not work rite , but have to get out of the construction field , body can’t take it anymore , my name is Frank .C any intel would be great full thxs

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *