I am developing a project that deals with sports data. I am receiving xml feeds every seconds, and these xml will be parsed and save it into the ms sql database. My problem is the database size/records are increasing everyday, almost millions of records are save everyday and in just 2 days the database backup is 1GB. I am concern that it will effect the performance of the web application. There will be a page that it will query the database every seconds.
asked Dec 19 '11 at 18:58
Essentially, you're storing millions of 'historical' records—not transactional data. The typical OLTP approach and schema: normalized, entity-relationship, while good for quick updates and immediate, up-to-date queries won't fit your use case.
You should be looking at a multi-dimensional, star schema that favors the types of reports that you want to be generating from all that data.
Simplistic example: instead of querying 1 million rows for one day's worth of total sales (then performing an expensive
If your concern is about how manage the (growing) database size—then you really have to choice but to invest in large(r) scale storage and backup systems.
Yes, 1TB drives for consumer desktops are cheap but if you want a highly-available, redundant, high-performance applications then you really should be looking at storage-area networks, RAID arrays, and tape backup.