Data Quality issues cost you as a Consumer time and money: Shopping on Amazon as an example

Data quality issues? Is this not something only relevant to business analysts, data-warehouse specialists or market researchers? 10 years ago probably most of us would have thought so. And according to Gartner, Inc. ( the world’s leading information technology research and advisory company ) the cost of poor Data Quality is also going to continue in the future to hurt organizations around the world (listen to Gartners Voice interesting podcast here: The Cost of poor Data Quality).

We are going to show in this article that the Internet has also changed that: No longer Data Quality issues are hidden behind corporate (fire)walls. The Internet is “transporting” Data Quality issues directly to the computer of everyone who is shopping online.
To make this article more interesting we are going to demonstrate that Data Quality issues cost you as a Consumer time and money by looking at the largest online retailer as an example: Amazon (probably everyone else has the same issues but because of Amazons size it is easier to spot them on Amazon). Please continue reading if you do not want to spend $20 too much for your next USB Hub just because of some Data Quality issues in Amazons product catalog.


Amazon is because of its proven excellent service my preferred online shop. I am interested to buy via Amazon this Hi-Speed USB 2.0 7-Port Hub produced by Belkin because someone recommended the product to me. I expected to find the Hub at a reasonable price directly by searching Amazon with the Belkin part number “F5U237v1″. I found this number on the Belkin product page. Try this Amazon search yourself. I got today this as a result (please click the image to see details):

belkin ausschnitt

Basically one and the same Hub is surprisingly not found only once in the product catalog of but eight times . It goes by different names like “7-PORT USB 2.0 Hub by Belkin”, “Belkin Hi-Speed USB 2.0 7-Port Hub F5U237v1 “, “BLKF5U237V1 USB Hub, 2.0, 7-Port, Black ” and others. For each of these 8 entries there are several different merchants offering it. The price range across these redundant entries in the Amazon catalog goes from $30 to $63.99.
If you are unlucky you would maybe first navigate on to “Home+Garden”/”Home Improvement”. Searching from there for the Hub will find today only two different entries in the product catalog. The cheapest price you can get today in Amazons department “Home Improvement” is $52.50 (new). Compare this with the best price of $30 for the same Hub (new) in the department “Electronics” and you will probably agree that Data Quality is something that costs Consumers time and money.

But the issue of Data Quality of the Amazon product catalog does not have only a negative impact on finding in a simple way the best price on Amazon. The customer reviews are to me one of the most valuable features of Amazon. Looking at this entry “Belkin Hi-Speed USB 2.0 7-Port Hub – Hub – 7 ports – Hi-Speed USB” of the Amazon product catalog and its review I probably would think again about buying the Hub. This entry “Belkin Hi-Speed USB 2.0 7-Port Hub F5U237v1″ for the same Hub is providing a more balanced view from different reviewers. Bottom line: The Data Quality issue shown above is also having a negative side effect on one of the most valuable features of Amazon: the customer reviews.

Based on our example one probably could argue that mostly the merchants are creating the Data Quality issue of Amazons product catalog. But if we are looking at this example: “Belkin USB + Firewire Haub 6 PORT USB 2.0″ it is becoming clear that this is not the case. The misspelling of this entry (“Haub” instead of “Hub”) in the German catalog of Amazon (also Amazon itself is using this entry in the catalog to sell the product) probably makes it difficult for Consumers to find and buy the product.

Closing comment:

Cindy Cunningham (who worked in the past for Amazon on the product catalog) provided in 2003 some interesting details of the complex issues Amazon is facing around Data Quality of the product catalog. As we have just seen, Gartner is right: Data Quality is an issue today and in the future for organizations like Amazon. But it is important to understand that this is not an “internal” Enterprise issue only. Data Quality issues are hitting today directly every Consumer every day.

The good news is coming last: Amazon really cares about the quality of its product catalog. Amazon sees the product catalog as one of their key competitive assets. We hope in our best interest as Amazon customers that top developers will apply to this recent job opportunity (Oct 26, 2007):


Software Development Engineer – Product Matching-022892

Job Description

If you’re looking for engineering challenges related to automatic text processing, scalability, and performance we’re looking for you. The Amazon Product Matching team is responsible for processing millions of product descriptions each day and determining their similarity to other items already in our massive global product catalog. This is a high-throughput Information Retrieval problem that includes components of extraction, relevance ranked full-text search, heuristic processing, and statistical analysis. You will bring a background in not only object-oriented design and software development, but experience in IR and NLP. We will expect you to build and run a high-quality, low-latency service at the core of Amazon’s e-commerce platform. We will support your software development efforts and your professional growth within a small-team culture: ownership will be your guide and your reward.

We work closely with our customers to support each new product category launch as well as incrementally improving the quality of the Amazon product catalog, one of our key competitive assets.

We are seeking experts in building enterprise server side applications with strong emphasis on adaptability and auditability. Team goals include exploring state-of-the-art data storage techniques while scaling the present service.


* B.S. Computer Science and 2 years industry experience
* M.S. Computer Science and/or 5 years experience preferred
* Proficient in C++ and/or Java as well as Linux data processing tools
* Skilled OOD
* Experience with distributed software architectures
* Professional coder / knows how to write robust, high-performance code

Apply online to this position or visit Careers at Amazon to see all our current opportunities.

Thank you.
Amazon Recruiting Department


Some questions to our readers:

  • Have you found similar Data Quality issues on Amazon or other major online shops?
  • If you are working around Data Quality issues yourself: Are there tools on the market to address Data Quality issues in product catalogs that you would recommend based on your own experience?

Thank you for reading.

Bookmark Buttons
Bookmark this: Digg Bookmark this: Bookmark this: Facebook Bookmark this: StumbleUpon Bookmark this: Google

November 1st, 2007 at 9:00 pm and is filed under Issues explained. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

3 Responses to “Data Quality issues cost you as a Consumer time and money: Shopping on Amazon as an example”

  1. Beth Breidenbach Says:
    November 27th, 2007 at 6:23 am

    I haven’t assessed customer-facing online data, but have seen the same issues in ERP systems. Product descriptions are notoriously hard to standardize, and are something of a plague on industry as a whole. (Or at least, they’re a poorly-addressed research space when compared to the work done to date in standardization of names and addresses.)

    Most data profiling and data quality products can be applied to product data — some better than others. But, if you don’t manage the data centrally (i.e. Master Data Management) you’ll be continuously cleaning up the records.

    The $64,000 question is how best to implement product master data management/cleansing in a dynamic, user-managed world like Amazon or eBay.

    btw, I’ve cited your article in my blog. You can find the post at

  2. Daragh O Brien Says:
    Dezember 16th, 2007 at 7:12 pm

    Excellent analysis of Amazon’s issues, some of which we’ve featured on The link to the ‘insider view’ is interesting.

    My only concern is Amazon’s job advert seems to see this as a ‘software’ problem. However unless they have an equivalent (if not greater) focus on the business process side of things (e.g. what processes create/update product data) and on the governance around those processes then they will affect very little change. Tom Redman (one of the first Data Quality gurus) is quite blunt about it – “if you are in IT and you’re asked to fix information quality, get out and work in the business because the changes necessary can’t happen in IT”.

  3. Example of how Data Quality issues can ripple through a service and reduce the quality of the user experience: the Amazon case revisited | Scharnetzki┬┤s - line of reasoning Says:
    Februar 3rd, 2008 at 5:28 pm

    [...] a previous post we discussed Data Quality issues in Amazons product catalog. We showed the details of how those Data Quality issues can make it for Consumers difficult to find [...]

Leave a Reply