Can you please guide me to what would be the preferred hardware requirement for a data warehousing setup with the following components:
An ETL process to populate the entire data warehouse.
Data warehouse size 100-1000 GB, 25 million fact records with five to six dimensions having more than 50,000 members, and one to two SCD's having 1 million records. The other 35-38 dimensions having less than 10,000 members.
- The Microsoft analysis services containing:
- Eight regular cubes with distinct count measure in it.
- A single regular cube consisting of around 20-25 additive (SUM) measures.
- Six virtual cubes consisting distinct count, additive and calculated measures with 130-140 measures in total.
The first thing I'm going to look at is database size. You have provided a very large range of database size, from 100 GB to a terabyte. I would recommend differently for each extreme so I encourage you to try to get more specific about this. And furthermore, is this initial data warehouse size or does it account for growth?
We use database size as a primary determinant of platform efficiency because, in larger implementations with hundreds of users, usage tends to become predictably multi-faceted. In other words, most 3 TB data warehouses will have 250–400 users with 75% doing desktop OLAP (across a complete bell curve in terms of actual usage volume), 15% writing reports and 5% data mining, etc.
Pick a platform that works for at least your anticipated three-year scale. For example, if the system is not anticipated to become a terabyte within three years, you can choose a system that will not work today at the terabyte level. If you happen to get to that level in, say, five years, in all likelihood, the system you choose will also have scaled itself to support that level of usage then. In other words, let's not buy a system to "kill an ant with an atom bomb."
When I say "work," I mean work with high likelihood of success and without the need to tune a system at its 99th percentile of tuning. The systems we recommend have a "low" risk factor for BI platforms which means a low likelihood of not scaling or needing augmentation with other systems (you want to buy systems that "do it all") and having to be replaced.
You have provided information about the database, but not the users and usage. That will also be an important input to the hardware requirements. Usually BI systems will have heterogeneous usage that approaches a vibrant usage mix. But maybe your shop won't. Another assumption is that an end-user shop would not wish to become an R&D shop for niche software in its main areas of BI software need. Here again, maybe your shop is OK with taking some chances.
Nonetheless, your shop is "up there" in terms of SQL Server usage and you may consider something like a Dell PowerEdge 8450 8 Intel® Pentium® III Xeon™ 900 MHz processors with 24-32 gigabyte (GB) RAM but do get more specific with the requirements before you go shopping.