Using a Single Model for Both Data Entry and Reporting Applications

  • By single model we mean that you can use a single Entity Data Model for both OLTP / Data Entry applications and Data Warehouse / Reporting applications in a case when your OLTP schema design follows next rules:

    • You have a single source OLTP database or multiple source OLTP databases that do not overlap in terms of tables and data.
    • You have no more than 3 very large tables to join.
    • 3rd normal form or higher.
    • Column names follow naming conventions that work for end users.
    • Only surrogate integer keys are used for all primary and foreign keys.
    • Null values are not allowed in neither primary nor foreign keys.
    • String values are replaced with integer keys for fields that are often being heavily used as MDX query filters or aggregation keys (this one may or may not be important in your case).

    Such rules make sense for both OLTP source and data warehouse anyway, though you will likely have some differences between OLTP and data warehouse physical and logical schemas (e.g. different indexes and views, denormalizing indexed (materialized) views in data warehouse and so on). But such differences do not prevent you from using the same Entity Data Model for both OLTP and data warehouse. In many cases you can use the same Entity Data Model even for quite different physical DB schemas employing updateable and regular views, but that makes your solution more complex. Of course you may have a different set of attributes and behavior between data entry and reporting application models, but it is not an excuse good enough to add complexity and duplication using very different non-conforming models for both. We rather suggest to put common attributes and behavior into common interfaces and base classes, and specialize them for each application via inheritance and/or composition (even if you use CQRS). Such approach will eliminate a need in complex translations (transform phase of ETL) and structural synchronizations between different models and will help spread ubiquitous language. Do not Repeat Yourself principle is essential for Agile application development.

    Single model may be too hard to achieve if data originated from the same tables is owned by multiple teams, e.g. one team owns writes to a table and others work on reporting and data warehouse. In such a case we recommend to restructure teams to do all related work for the tables that they own, from data entry to reporting and cubes. For Agile workflows the best way to separate team responsibilities is by structuring teams according to Bounded Contexts (e.g. sales, accounting), rather than by functional similarity (e.g. Web UI, Web-services, OLTP, data warehouse). If that is not feasible and when communication between teams is impeded, you might have to partially or completely abandon an idea of a single model for both writes and reads, or switch back to waterfall workflows.

    Many data warehouses follow Kimball's model and are built employing a star schema. That is because SQL joins are slow for typical analytical workload with lots of rows to join and aggregate, and because SQL is hard to use when many tables have to be joined.

    But that is an optimization for SQL, and SSAS in MOLAP mode usually works faster with normalized schema for both data processing and querying. Data is pre-joined by MOLAP, so it is very easy to query anyway, and there is no need to denormalize data manually. So you can get the most simplicity, agility and flexibility by using the same normalized physical schema for both source OLTP and data warehouse.

    In a case of 3NF data warehouse OLAP cube acts as a star schema data mart, and ETL phase becomes so trivial that it can even be completely replaced by automatic replication or mirroring. (We recommend SQLBulkTool in cases when replication or mirroring are not a viable option). The need to fix/change data in multiple places disappears, and there is no redundant data anymore, so you have a single source of truth. No more painful decisions for you regarding where data is correct and where it is not. All you need to do in order to fix all your data issues is to fix them in a single place in a source DB, re-sync your Mirror OLTP, reprocess your OLAP cubes and rebuild optional indexed (materialized) views. We believe that this is the cleanest, simplest and most Agile solution, robust in face of constant changes and built on a solid theoretical ground. (For more details regarding Mirror OLTP see SQLBI methodology).

    If you do not keep track of historical data changes at all, or do it in OLTP source side (e.g. Event Sourcing model), we suggest to combine both Mirror OLTP and data warehouse in a single 3NF database, Such 3NF database may have views that denormalize / flatten dimensions for easier SQL access (see "Do not Overuse SQL and Star Schemas" side note).

    In case you keep track of historical data changes on data warehouse side (e.g. slow changing dimensions), we suggest to separate Mirror OLTP and data warehouse databases, but still use mostly identical physical DB schemas for both, eliminating or simplifying transform phase of ETL. (See "Snowflake Schema and History of Changes" side note for details).

    3NF data warehouse model is a foundation of Inmon's data warehouse model. We suggest to merge, unify and simplify both Kimball's and Inmon's models. Both models have good ideas, which could be mixed to match your specific situation. Such mixed solution is often referred to as a hybrid model. There is nothing that prevents you to use Kimball's bus and conforming dimensions, together with Inmon's like 3NF data warehouse (which can sometimes be viewed as a decomposed and "snowflaked" case of Kimball's model).  Nothing stops you from having your data warehouse physical model in a 3 normal form, as Inmon suggests, and use a dimensional Kimball's model based on a fully normalized snowflake schema. In fact, that is exactly what we recommend to do in most cases when you have MOLAP / HOLAP cubes, or if you can build them.

    Overall our suggested Single Model approach is similar to SQLBI methodology, except that we recommend to replace star schema data marts with OLAP cubes and optional flattening views for ad hoc SQL. We also recommend to avoid using multiple different DB schemas and complex ETL with transformations and mutable Staging DB, whenever you have this option.

     

    Comments:

     

    See also: