Magento: How to build reporting datawarehouse for Magento?

Disclaimer : Below information is from my experience with Magento platform.

Right! so you have experienced the kiss of death by your awesome report or a feed generation in the dead of night ?

Why ? Data intensive jobs especially mundane tasks like sitemap/google feeds generation take up lot of your DB power and results in website unable to take orders and what not.. even website not available ;) So, you need to decouple your data from the production server.

We have done that and here are some thoughts to help you build  your reporting data warehouse.

  • Setup MySql Replication in another MySql box(best way else a complete dump takes 30 mins and delta sync takes around 8 minutes depending on the size)
  • Flatten Data (We have shared our schema here)
    • Sales :
      • sales_flat_order : Might be worth adding the customer_created_date for Cohort Analysis and also other customer info like address etc ( city, region without personal info). Keep an eye on the base totals and the order totals ;)
      • sales_flat_order_items : This contains the line items with their SKUs. Pretty straightforward but you might want to add the config product information on the line item basis for easier joins later on in the BI tools and feeds.
    • Credit Memos/ Shipping
      • Really a one on one mapping. Usually this information is in the order level, so you might not use these tables really unless you want daily refund reports and snapshots
    • Products
      • General: Rows for simple and configurable product information
      • Adding configurable info to simple rows: Simple SKUs are used in transactions which are used for reporting. It might be a good idea to add new columns like configurable_name, configurable_sku, configurable_product_id etc to the end so that your analysis later becomes much easier, specially in slow Business Intelligence tools. Else this is another query
      • Final price: This is the price seen in the website. The ecommerce platforms check for any valid special prices based on special_to and special_from; else puts the shows price in the website. While reporting you want to put the LIVE PRICE and not the original or special price. So think about this ;)
      • Cost : Regarding the cost, usually websites purchase produce in one price and currency. Hence you might want to pick this up from the default scope of cost : cost_admin which is constant for all sites in one currency.
      • Attributes with website scope: Eg. price_gb, price_us; url_gb, url_eu, name_gb, name_us etc .
      • Attributes with Translations: Do you need the Arabic Labels in your Feeds ?
    • Store ID mapping : All tables above refer to store ids in the columns. You will need a handy map for reference to understand which order is from which store and apply the related exchange rates to the base totals to roll up to your global currency. Clients in UK prefer all transactions to be rolled to GBP.
    • Base Currency Mapping : This is important for your conversions to the Global Currency since orders for each store is stores in the store base currency and the order transaction currency and most of the time you want to see the information in one single currency which is what you work with. Clients in UK prefer all transactions to be rolled to GBP.
  • Sync Type
    • Full Sync : We recommend a full sync once a week
    • Partial Sync : Partial sync is very easy for order information. Every info like canceled amount, refunded amount needs to be re-synced with time since things are changing daily. For products this might get tricky but I am sure you will find a solution with the index tables.
  • Data Privacy Issues : Due to data prvacy issues, you need to ensure customer identifiable information like postcode, customer_email, first_name, address_line_1 etc should not be replicated to the reporting warehouse.  For all analysis like Cohort analysis, you might have to use the customer_id

What next ?

Point all your feeds to this new fresh database, sleep in peace and release your production of its mess and just focus on making money.

Sample Scheme we have created : Magento Database Schema


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>