Web scraping solution for tackling unfair competition

See how GroupBWT assisted a leading law firm in tackling violations of online sales and unfair competition by providing high-end web scraping services

Tracing Unauthorized Sellers Across Marketplaces With Real-Time Product Scraping

Two years ago GroupBWT engaged in a long-term collaborative project with a major legal player in the US market. The project’s aim is to collect data from Walmart and Amazon. This data relates to the extensive lists of selected keywords and products.

Technology used: Laravel, Scrapy Python, Puppeteer, MySQL, mSQL, RabbitMQ

Industry:	Retail and e-commerce
Cooperation:	Since 2018
Location:	USA

We streamlined the process allowing data scraping to be executed in 100 -150 streams simultaneously. We ended up extracting data for up to 4 million products from Walmart

We synchronized with the external Azure SQL database, and launched the sync. Overall, 20 mln reviews were successfully collected from Amazon

Industry and Services

Web Scraping Legal Data Extraction Data Aggregation

Check All Сases

Introduction

Brands strive to be protected online by getting their sales controlled.

High level competition has led to the adoption of numerous technological solutions based on data. Over the past decade, brands have undergone a digital transformation, going omnichannel, closing brick and mortar stores, and switching their sales to the online channels. If you don’t have an online presence, the future of your brand or company may be at a disadvantage. However, growth of online presence has led to the competiiton violation. The client tied with us to get a solution to combat unauthorized sellers, control and grow online sales, achieve MAP compliance, eliminate channel conflicts, and protect brand value and customer experience.

The Solution

We devised multiple scrapers, and built an admin panel to interact with the client.

This allowed us to exchange data in a more efficient way. The scraping was triggered on keywords the client was uploading into the admin panel. Our job was to scrape sellers and products related to these keywords.That allowed us to scrape 4 million products from Walmart and 20 mln reviews from Amazon platforms. Scraping of giant platforms like Walmart and Amazon is not just a tough nut to crack due to the amount of products and pages, but also because such websites adopt strict measures to limit the practice of scraping. It is not always clear when or if a process is delivering, as the product and catalogue pages differ in their structure and can confuse the scraper logic.

The challenge was not to build just a crawler, but a crawler that would run smoothly due to the vast amount and variety of input data that it would be exposed to. This crawler needed to be highly resilient, this was achieved by applying a combination of request scheduling techniques and IP rotation. This was to avoid the identifiable bot behavior patterns. Listed below are some precautionary measures we followed throughout the process:

• IP randomization

• IP addresses that are within the reasonable proximity from the store

• Keeping the chosen IPs for the scraping session

• The proxy pool changes every 24 hours.

Walmart applies the AJAX technique to the pagination button, so we made the algorithm taking the loading process as the cue to start.

We streamlined the process allowing data scraping to be executing in 100 -150 streams simultaneously. This allowed us to collect 20 million customer reviews from Amazon within the duration of the project. For Walmart scraping, we did more than 1000 and ended up extracting data for up to 4 million products from the website. Pagination was repeated for over 1000 times per each provided keyword.

Alex Yudin

Web Scraping Team Lead

The Result

Our client has successfully launched an eControl service for its clients, and currently it is helping dozens of US brands to stay

The data pipeline is used to enable company’s legal investigation of unfair sales retail practices. As a result, the client has been using the collected data to counter unfair competition for big brands, and prevent their erosion as a result of damping prices. Working with us ensured they received the stable flow of fresh, quality data on provided keywords, products, and suppliers. Our cooperation is still ongoing.

4M+

products extracted from Walmart

20M+

Amazon reviews collected

150

parallel scrapers live

Ready to discuss your idea?

Our team of experts will find and implement the best Web Scraping solution for your business. Drop us a line, and we will be back to you within 12 hours.

You have an idea?
We handle all the rest.

How can we help you?

I have been working with GroupBWT for almost a year now, and I honestly think they are the best outsourcing company I have worked with.

During Covid-19 outbreaks, I increased and decreased capacity. They did everything to accommodate my requests and made me feel comfortable I highly recommend working with them.

Uzi Refaeli

Founder, Wealth management startup

From solution design to implementation, they’re very capable across the board.

GroupBWT consistently delivers high-quality and error-free work. The team offers a breadth of capabilities and are highly skilled in everything they work on. They’re communicative and aren’t afraid to ask questions.

Julian Martin

CTO, Job matching platform

I was appreciative of their problem-solving and can-do attitude.

GroupBWT delivered a fully functional and error-free MVP of the mobile app, which has launched in the appropriate stores. Their engaged project management approach fostered a communicative and efficient engagement.

Gillian de Brondeau

Founder of the Veview platform

Web scraping solution for tackling unfair competition

Tracing Unauthorized Sellers Across Marketplaces With Real-Time Product Scraping

Industry and Services

Brands strive to be protected online by getting their sales controlled.

We devised multiple scrapers, and built an admin panel to interact with the client.

Our client has successfully launched an eControl service for its clients, and currently it is helping dozens of US brands to stay

Related Insights

How Real-Time Airport Scraping Unlocked Scalable Flight Delay Verification for a Passenger Rights Platform

How a Custom Data Lake Became the Core of External Intelligence for a Global Analytics Team

How Checkout Scraping Unlocked Competitive Delivery Intelligence for an E-Commerce Logistics Provider

You have an idea? We handle all the rest.

You have an idea?
We handle all the rest.