Brands strive to be protected online by getting their sales controlled.
High level competition has led to the adoption of numerous technological solutions based on data. Over the past decade, brands have undergone a digital transformation, going omnichannel, closing brick and mortar stores, and switching their sales to the online channels. If you don’t have an online presence, the future of your brand or company may be at a disadvantage. However, growth of online presence has led to the competiiton violation. The client tied with us to get a solution to combat unauthorized sellers, control and grow online sales, achieve MAP compliance, eliminate channel conflicts, and protect brand value and customer experience.
We devised multiple scrapers, and built an admin panel to interact with the client.
This allowed us to exchange data in a more efficient way. The scraping was triggered on keywords the client was uploading into the admin panel. Our job was to scrape sellers and products related to these keywords.That allowed us to scrape 4 million products from Walmart and 20 mln reviews from Amazon platforms. Scraping of giant platforms like Walmart and Amazon is not just a tough nut to crack due to the amount of products and pages, but also because such websites adopt strict measures to limit the practice of scraping. It is not always clear when or if a process is delivering, as the product and catalogue pages differ in their structure and can confuse the scraper logic.
The challenge was not to build just a crawler, but a crawler that would run smoothly due to the vast amount and variety of input data that it would be exposed to. This crawler needed to be highly resilient, this was achieved by applying a combination of request scheduling techniques and IP rotation. This was to avoid the identifiable bot behavior patterns. Listed below are some precautionary measures we followed throughout the process:
• IP randomization
• P addresses that are within the reasonable proximity from the store
• Keeping the chosen IPs for the scraping session
• The proxy pool changes every 24 hours.
Walmart applies the AJAX technique to the pagination button, so we made the algorithm taking the loading process as the cue to start
We streamlined the process allowing data scraping to be executing in 100 -150 streams simultaneously. This allowed us to collect 20 million customer reviews from Amazon within the duration of the project. For Walmart scraping, we did more than 1000 and ended up extracting data for up to 4 million products from the website. Pagination was repeated for over 1000 times per each provided keyword.
Our client has successfully launched an eControl service for its clients, and currently it is helping dozens of US brands to stay
The data pipeline is used to enable company’s legal investigation of unfair sales retail practicesAs a result, the client has been using the collected data to counter unfair competition for big brands, and prevent their erosion as a result of damping prices. Working with us ensured they received the stable flow of fresh, quality data on provided keywords, products, and suppliers. Our cooperation is still ongoing.