Low PLE but high Buffer Cache Hit Ratio

Recently I worked on an interesting scenario where an incident was reported for very low PLE value. Very low PLE value indicates memory pressure on SQL server and should be fixed to ensure good SQL server performance. First lets check the definitions of PLE and BCHR as per Microsoft BOL:

Page Life Expectancy: Indicates the number of seconds a page will stay in the buffer pool without references

Buffer Cache Hit Ratio: Indicates the percentage of pages found in the buffer cache without having to read from disk. The ratio is the total number of cache hits divided by the total number of cache lookups over the last few thousand page accesses. After a long period of time, the ratio moves very little. Because reading from the cache is much less expensive than reading from disk, you want this ratio to be high. Generally, you can increase the buffer cache hit ratio by increasing the amount of memory available to SQL Server or by using the buffer pool extension feature.

In performance tuning, we should not be dependent on one parameter. We should follow a holistic approach by evaluating multiple parameters and summarizing the final outcome from them.

Consider below scenario in my case when I checked multiple parameters related to buffer manager and memory manager for SQL instance:

  1. Buffer cache hit ratio was constantly higher than 99%.
  2. Lazy writes/sec was constantly zero.
  3. Page reads/sec value was very low.
  4. Free list stalls/sec was constantly zero.
  5. Page writes/sec value was also very low.
  6. Page life expectancy value was between 0 to 100 on a server with 256 GB RAM with Max Server Memory as 248 GB.
  7. Checked activity monitor and there were no expansive queries running.
  8. Memory utilization on server was just 3% with 7 GB RAM usage.
  9. Total Server Memory was ~2.5 GB and Target Server Memory was ~248 GB.
  10. There were no memory grants pending.
  11. SQL Server error messages logs:
    A significant part of SQL server process memory has been paged out. This may result in a performance degradation. Duration: XXXX seconds. Working Set (KB): 68000 KB, committed (KB): 2616386, memory utilization: 25%.

    First five observations suggest that server was at low utilization compared to its capacity and nothing was wrong with server performance. However when we consider points 6, 9 and 11, we can conclude there is a memory pressure on the SQL instance.

To understand more of this situation, lets have a look at below further observations:

  1. SQL service account was not having Lock pages in memory permissions
  2. Default values of zero and 2,147,483,647 megabytes (MB) for Min server memory and Max server memory respectively.

To conclude, without lock pages in memory permissions , zero min server memory and 3% RAM utilization, it was a case of low utilization not memory pressure. Page out messages were due to very low working set memory and no LPIM permissions. Please note that  SQL Server does not immediately allocate the amount of memory specified in min server memory on startup. Refer this for details of min server memory.

I also confirmed SQL performance with running some of SELECT statements without any issues. Page out messages are logged when the working set of a SQL Server process reaches 50 percent or less for the memory committed to the SQL Server process.  Due to page out working set was getting below 50 percent of committed memory and message mentioned in point 11 was getting logged.

Kindly also refer this article which suggest that page out messages can be logged with no activity on the server. Article states “On a server with no databases to recover and no activity, it is quite possible to see the above behavior.”

Based on above analysis, I safely ignored the low PLE warnings and page out messages.

However if values of Page reads/sec, Free list stalls/sec, Lazy writes/sec is high, this would be case of memory pressure in spite of high value of buffer cache hit ratio.

Kindly read this wonderful article by Jonathan Kehayias explaining buffer cache hit ratio in detail.

Kindly share your views and scenarios with PLE.

Hope this article will help you in troubleshooting memory issues!

Happy learning!!