During my research into prior art I discovered Anthony Eden's ActiveWarehouse and ActiveWarehouse-ETL projects, and gave them a test drive using a fictitious "Cupcakes Inc" site.
I presented this at the Jan 2010 Singapore Ruby Brigade meetup held at hackerspace.sg. My "point-of-view" slides are embedded below, and you can find the sample project and doco on github.
Conclusions?
- ActiveWarehouse is a textbook implementation of classic data warehousing techniques. That was clearly Anthony's intention, but it also means it does not really attempt to explore how data warehousing might be approached quite differently with Ruby and Rails
- ActiveWarehouse/ETL are not for the faint-hearted. When you get them working, they works well, but the lack of documentation basically means it's inevitable you'll end up reading the sources to figure it all out
- I have concerns about scalability. Having worked on terabyte warehouses using "classic" technology, I know just how far you push databases in order to scale. This bears more investigation and testing before it would be sensible to commit to ActiveWarehouse for a large-scale DWH implementation
Nevertheless, ActiveWarehouse and ActiveWarehouse-ETL are interesting projects, and the underlying implementations make for some educational code reading. Hopefully my slides and the Cupcakes sample project will add a bit to the available documentation, and give a bit of a leg up to anyone intersted in checking out these projects;-)
Soundtrack for this post: Information Overload- Living Color