Solution of Avoiding "Stale Read" in Real-Time Data Flowing System


Background of the Problem

Real-time data flowing systems have following characters:

Typical real-time data flowing systems include industry monitoring systems, sensor monitoring systems, location services systems, e-business websites, telegraphy charging system, and servers logging systems.

What's the Problem?

Performance optimization is a key requirement for collection, storage, transformation, query, and statistic of data in real-time data flowing systems. In actual implementations, multiple levels of performance optimization technicals are developed, like following:

But problem "Stale Read" can arise when benefit from performance optimization technicals. For example, cache of persistence objects can cause queries of slave database unaware of data changes: When a row is updated in master database, slave database gets the update but its JPA cache does not know this row has been updated since the update comes from outside but not its JPA, and queries on slave database will get expired row.

How to avoid Stale Read while take advantage of performance optimization technicals?

Solution of the Problem

With my practices, persistence objects must be refreshed in master and slave databases when data is created or deleted. If a kind of data is never updated and only created or deleted, then Stale Read never happens against this kind of data. So the solution is to catalog the data in system, apply read-write separation along with caches safely upon data which never be updated, and do all operations on master database or close cache in persistence level upon the data which will be updated.

Luckily, core data in most real-time data flowing systems are only created or deleted. For instance, in location services system, largely accumulated data are coordinates values which are never changed once record and can take advantage of read-write separation and caches. While all operations about user management in the same system are in the master database since user data are changed frequently and performance may not be affected cause the data size is limited.