Fast Refreshable Materialized Views

Not many people, including a few reasonably smart ones, understand how exactly materialized views work and what makes certain types of materialized views fast refreshable. Not understanding how they don’t work is ok, but expecting them to work for every situation and trivializing the problem at hand to simply throwing a materialized view into the design is what prompts me to write about materialized views. So, let me try to explain them without using technical jargon.

First a few definitions.

Materialized Views are database objects that are like views in the sense that their definition is based on a SQL and also like tables in the sense that the result of the SQL is actually materialized (stored) and hence the name materialized views.

The next question is, if the SQL results are actually materialized, how are these results kept in sync when the underlying data changes? This is where, the idea of refreshing a materialized view comes into picture. Obviously the choice is either to keep updating the materialized data either instantaneously as the underlying data in the SQL is changed or to do it periodically. So, those that can be refreshed instantaneously are refreshed on commit. Otherwise, they have to be refreshed in deferred mode by explicitly calling some APIs.

Note though, only certain type of materialized views can be refreshed fast. We will get to the details soon. But suffice it to say, if a materialized view can’t be refreshed fast, then it has to be refreshed full and that means, such materialized views can’t be refreshed on commit.

Ok, now let’s get to the details. Any SQL query that can be maintained incrementally can be fast refreshable. Otherwise, it has to be refreshed full. Now I will give a real example to explain this.

Say a university professor is meeting each freshman student on the first day and trying to gather the following statistics. The min, max, average SAT scores along with the total number of students. Now, say that the prof had only a small sheet of paper just enough to write a handful of numbers while the number of students joining is in thousands. So, instead of writing down the scores, the prof choose to just maintain these metrics.

So, the first student came and the score is 2000. So, the min, max and avg are 2000 and the count is 1.
Second student score is 2100. The min remains 2000. max is 2100. avg is 2050. count is 2.
Third student score is 2200. The min remains 2000. max is 2200. avg is …

oops, how to find the average? It’s simple. Take the previous average of 2050 and the count of 2, that gives the total as 4100. Now add the 2200 and divide by 3. The result is 2100.

So, to find the average based on the 3rd student, it’s not necessary to go back to the first two students and get their scores. Just the current average can be used (along with the current count) to update the average based on the next available score.

Any piece of information that can be computed incrementally, based on the existing and computed information and the new data, is fast refreshable. Since in real world applications, it’s not always new data, but also updates and even deletes, it’s important to consider all the cases.

Going back to the example, the professor can keep on updating the metrics as new students are coming in and giving their SAT scores.

Now say, after some 257 students, one of the earlier students comes back says “Professor, sorry I made a mistake. My score is not 2000, but it’s 2010”. What then? How does the professor update his metrics? If the minimum by that time happens to be 1900, then there is no change required to the minimum. But what happens if the minimum happened to be 2000? Can the professor update it to 2010 since the student with the min score changed from 2000 to 2010? The answer is no. This is because, among the 257 students, if any of them had a score between 2000 and 2009, there is no way to immediately conclude that 2010 is the minimum score. What this means is, the professor is not in a position to immediately figure out the minimum score based on just the current minimum score and an update to one of the existing scores which happened to be the minimum. Thing is, with metrics like min and max, some times they can be derived based on the existing info (if the value being changed happened to be neither the current min or the current max) and sometimes it’s not possible. However, the count remains the same and the average can still be maintained accurately. All it needs is to figure out the total sum by multiplying the current average with the current count and then adding 10 (2010-2000) and then again dividing using the current count. Ofcourse, as there is a chance for losing precision when dividing and multiplying it’s important to keep the SUM and COUNT metrics rather than the AVG directly.

So, the above example should give an idea of what we mean by being able to manage a metric incrementally. When information is created, updated or deleted, if it’s possible to update the metric based on the current value of the metric and the delta changes, then it’s possible to fast refresh such metrics. This is one of the reasons why SQLs that contain transient values such as current time (sysdate) can’t be fast refreshed because as time passes, infact without any change to the underlying data, the metric is constantly changing. Outer joins also have similar issues.

Another of my favorite examples where it’s not possible to refresh fast is the count distinct. This is not possible because, going back to the earlier example of SAT scores, if the professor is keeping track of the count of the distinct scores, when a new score comes, how does he know if it’s a value that’s already been factored in or not unless the list of all the distinct values are also maintained? See, it’s not that hard to understand the fast refreshable capability of materialized views, if you assume that there is limited resources to keep track of the metric and if the metric can be tracked in various scenarios like create, update and delete without referring back to the past data, then it’s possible.

Advertisements

6 Comments

Filed under materialized views

6 responses to “Fast Refreshable Materialized Views

  1. Srikanth

    Hi,
    Thank you for the wonderful piece of information. But I have a question can a materialized view have
    derived values meaning that

    if a table t1 has id1,id2,id3
    as the columns

    can i create a materialized view as below

    select id1,case when id2=0 then 1 else 2 end, id3
    from t1

    Thanks,
    Srikanth (adibhatlas@gmail.com)

  2. S

    Hi Srikanth, materialized views are used mainly for two purposes. To materialize the joins and/or to materialize the aggregates. Your simple use case of “derived values” doesn’t really warrant a materialized view since it appears per your example, that each record in the base table translates to a single record in the materialized view as per the SQL. In this case, I recommend you to note use a materialized view at all as the cost of extra overhead to maintain the materialized views is probably more than the benefit of pre-computing the derived value.

    Having said that, I don’t see any reason why the above derived value can’t be fast refreshable. On the other hand, if you further try to aggregate such a derived value, then depending on the type of aggregation, it may or may not be fast refreshable.

  3. Sachin

    Hi,

    Thank you for the information given by you on materialize view. But I have a question, can we use functions for get values in the column of select statement of MV with Fast refresh option.

    Like, I have a function getEmp_Name(Emp_ID).
    I am using it in the select statement of MV like ‘select getEmp_Name(T.Emp_id),….. FROM TRANS T …’

    Thanks,
    Sachin Jadhav

  4. S

    Hi, you need to make your user defined function deterministic.

    “DETERMINISTIC Clause

    Specify DETERMINISTIC to indicate that the function returns the same result value whenever it is called with the same values for its arguments.

    You must specify this keyword if you intend to call the function in the expression of a function-based index or from the query of a materialized view that is marked REFRESH FAST or ENABLE QUERY REWRITE. When Oracle Database encounters a deterministic function in one of these contexts, it attempts to use previously calculated results when possible rather than re-executing the function.”

  5. Rob Goretsky

    Just wanted to thank you for this very clear explanation of FAST refresh. It is better than anything I have seen in official Oracle documentation! I use the “SAT Score” example all the time when explaining to coworkers about why a certain aggregate measure would be fast refreshable.

  6. Hi there to every one, the contents present at this web site are
    actually awesome for people experience, well, keep up the nice work fellows.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s