While reviewing your calculation for optimizing test intervals based on economic criteria, I find it important to emphasize that safety should always be the top priority when it comes to protective devices. The goal is to reduce risks to a tolerable level, especially when there is a potential threat to human life or the environment. It can be challenging to accurately determine factors like scale and shape for a single protective device due to limited data points. Therefore, we often make approximations by using data from similar devices to calculate average values such as Mean Time To Failure (MTTF).
It is worth noting that our approximations may not always meet criteria like being 'independent' and 'identical' when forming device families, which can introduce errors. Additionally, calculating MTTF assumes an exponential distribution, making accurate shape factor calculations difficult. In my experience with MTTF computations using PRV bench-test results in Oil & Gas services, failure-to-lift events occur with an MTTF range of 60-250 years.
In practical terms, when considering the test interval for a PRV with an MTTF of 200 years, the availability rates of 99.5% and 99% correspond to test intervals of 2 years and 4 years respectively. These intervals impact the occurrence of failure events and ultimately, the level of risk tolerance. It is crucial to balance economic pressures for longer test intervals with ensuring the availability of PRVs to mitigate additional risks effectively.
The importance of regular testing for PRVs cannot be understated, as these devices play a critical role in preventing catastrophic events like explosions. While cost considerations are significant, the primary focus should always be on maintaining the safety and reliability of protective devices in service.
- 27-11-2024
- Heather Coleman
After receiving feedback from colleagues, I have decided to implement a new strategy for system maintenance and analysis. Here are the key points of the approach:
- Consider a long period (P) of the system operating, such as a few years.
- Assume the protected function fails according to an exponential distribution.
- Assume the protective device fails according to a Weibull distribution.
- Set a specific inspection time interval (T).
- Whenever the protective device fails, it will be repaired, taking (r) hours, and the next inspection will start from 0. The device will be restored to a new state after repair.
- Estimate the time to the next failure of the protective device during each inspection based on conditional probability.
- Keep track of the number of inspections (I), failures of the protected function (N), failures of the protective device (n), and multiple failures where the protected function fails while the device is in a failed state.
- Assign a cost of $Ci to each inspection and a cost of $Cf to each multiple failure.
- Calculate the reliability of the system (protected function and protective device) as (1 – m/N), indicating how many times the protective device failed to fulfill its role.
- Determine the unit cost of system operation as (I*Ci + m*Cf)/P $ per hour.
- Assess the availability of the protective device as (1 – n/I), representing the times the device failed during inspections.
By running simulations and analyzing the data, we can obtain expected values for these variables within confidence intervals. By testing different inspection time intervals and finding the optimal one that minimizes combined inspection and failure costs, we can mitigate risk effectively. Your input and suggestions are appreciated. Thank you. Rui
Hello Vee, I believe that if multiple failures do not pose a risk to safety or the environment, it is reasonable to consider adjusting inspection intervals based on economic factors. Can you provide a counterargument to this logic? It is worth noting that simulating failures in computer systems requires the use of a process generator to generate random failure times, even though the exact timing of a protective device failure may never be known in reality. Regards, Rui
After reconsidering, I believe the time frame for inspections should be based on elapsed time, such as every two weeks, rather than accumulated time. This is because inspections are typically done in groups of equipment. Therefore, the number of inspection tasks completed in a given period (P) can be calculated by dividing P by T. Can you confirm if this approach is correct? - Rui
Keywords: time frame, inspections, elapsed time, accumulated time, equipment, inspection tasks, period, calculation
Rui stated that if multiple failures do not pose a threat to safety or the environment, there is no reason why inspection intervals cannot be based on economic reasons. He challenges anyone to provide a valid reason otherwise. While it is true that some protective systems may have hidden functions, the majority of important protective systems impact safety, the environment, or the lifespan of assets.
Examples of these systems include axial displacement trips, over-speed trips, pressure relief valves, press-guards, brake lights, emergency shutdown systems, emergency depressurization systems, conveyor emergency stop buttons, circuit breakers, reverse-current relays, and ship mooring release systems. Although some systems, like overload systems, may only have economic consequences, they are not the main concern. The presence of a trip system indicates high economic stakes. Therefore, there is little flexibility in adjusting test frequencies.
It is important to note that not all test frequency determinations can be analyzed purely from an economic standpoint. There is a risk that readers without a deep understanding of the subject matter may be misled into believing that all frequency determinations are open to economic analysis.
In the realm of reliability engineering, it is common for protected systems to experience degradation through various mechanisms such as corrosion, wear, fouling, and crack propagation. Monitoring the condition of these systems is crucial to determine the frequency of inspections based on the P-F curve and uncertainties in demand and degradation. While some degradation mechanisms follow a Weibull distribution, obtaining shape and scale parameters in practice can be challenging.
Furthermore, many protective systems have hidden functions and operate in either a working or failed state. It is often impractical to determine the shape and scale factors in these cases, leading to assumptions such as a shape factor of 1. Reliability data sources provide failure rates for both evident and hidden failures, highlighting the importance of being cautious when analyzing failure rates. Embracing a shape factor of 1 may be necessary to navigate the complexities of reliability engineering.
- 28-11-2024
- Frances Fisher
Thank you, Vee, for your valuable input and suggestions. After considering your feedback, I have decided to implement the failures of the protected function using a Weibull function. Users of the software will have the option to choose a shape-factor of 1, where the scale factor is equal to MTTF, or any other value. This flexibility extends to the economic aspect as well, as there may be differing opinions on the approach. Therefore, I believe it is prudent to keep both options open: a tolerable risk limit or the most cost-effective solution. I will provide an update once I have results to share. Thank you again, Vee and Josh. Best regards, Rui.
Attached are the results of my recent study that I would like to share with you for feedback to enhance the work. The study involves periodic inspections of similar devices on various equipment, focusing on elapsed time rather than running time of each device to ensure reliability and minimize costs. The data collected through simulation demonstrates the failure patterns of the protected function and protective device, as well as the impact of inspections and repairs on system performance and cost. Key outputs include the number of inspections conducted, failures detected, and the overall reliability and availability of the system.
In the simulation test, specific parameters such as failure distributions, repair times, and inspection costs were considered to determine the optimal inspection interval for the protective device. The analysis revealed that inspecting the device every 200 hours provides the most cost-effective solution. Additionally, a formula was suggested to calculate the optimal inspection interval based on the mean time to failure of the devices involved.
I invite your input on the best approach to determine system availability, as outlined in points 13 and 14 of the study. Your feedback is greatly appreciated in further refining this research. Thank you for your ongoing support. Rui
In a previous discussion, it was revealed that Daryl was the one who provided insight on the relationship between availability, failure rate, and inspection frequency. Daryl emphasized the importance of considering the MTBF of both the protective device and the protected function in calculations. According to SAE JA1012, the formula for inspection frequency includes factors such as the failure rate of the device, the failure rate of the function, and the desired level of risk. It is crucial to determine the failure rates within your plant or a similar operating context to make accurate calculations. Thank you to Daryl for sharing this valuable information.
Hello Rui, tudo bem! The formula mentioned above is sourced from the RCM standard guide, which has facilitated the widespread adoption of RCM practices. This formula is applicable in specific scenarios, such as when dealing with a device exhibiting random failure characteristics and equipped with a single protective device. It is essential to note that there are various configurations of protective devices and associated failure curves, making this formula not universally applicable. The primary goal of this formula is to achieve tolerable risk, indicated by the MTBF (Multiple Failure). In cases involving failure modes with economic implications, the formula needs to consider the optimal frequency based on cost rather than solely on risk. This is because addressing economic consequences involves balancing the cost of performing the task and the cost of potential failures. When discussing unavailability caused by random protective devices, the frequency of inspections plays a crucial role. For example, annual inspections may result in a 25% unavailability over a four-year period. However, for the formula to be effective, a known failure rate must be considered.
Hi Daryl, the answer from my program is rooted in economic factors, but the end goal may vary - such as establishing a minimum availability threshold. The formula I previously mentioned is no longer relevant. I am still seeking an answer to the question I posed: "Which method do you believe is more suitable for determining availability (as outlined in points 13 or 14 in my post from the 5th)?" Thank you, Rui.
In John Moubray's book, the formula "2 x MTBF(Device) x MTBF(Function)/MTBF of multiple failures" is discussed on page 180, along with other formulae for various scenarios. These formulae are estimates and are applicable within a specific range of T/MTBF values. When T/MTBF goes beyond this range (approximately 14%), the availability estimation errors become significant. In safety-related situations, T/MTBF typically ranges from 1-3%. On page 39 of my book, I present a formula that is valid over a broader T/MTBF range (around 20%) based on Nowlan & Heap's method. These formulae consider the MTBFs of protective and protected functions, following exponential distributions. Therefore, using a non-1 shape factor Weibull distribution is not recommended. Allowing users to input their own shape factors can result in unsuitable decisions. For scenarios involving human, environmental, or asset safety, economic analysis should not be the determining factor. The only acceptable criterion in such cases is 'tolerable risk', as there is no uniform way to assign a monetary value to human life. Consequently, economic considerations should not be applied in these situations. I recommend continuing this discussion outside the forum.
Hi Rui, I want to emphasize that making economic decisions based on safety and environmental risks may not be the best approach. The key factor here should be minimizing risks, especially when it comes to operational and non-operational hidden failures. While economic algorithms can be useful in certain situations, they should not be used when it comes to safety and environmental concerns. Prioritizing risk reduction is essential in these cases to ensure the well-being of both people and the environment.
I express my gratitude to Vee and Daryl for their valuable contributions and insights, drawing from their extensive experience and knowledge. When it comes to safety, it is important to acknowledge the difference between conducting research to discover something new, as I endeavored to do, and how users will utilize the findings. While my involvement in this area is limited, I have witnessed operational consequences in some companies due to user actions. However, once a method is established and comprehensive, it is left to the user to analyze and contextualize the results appropriately. For instance, in the scenario I presented, the availability (A) of a protective device can be assessed by the probability of it failing during inspections, which may lead to system failure. Is this approach more fitting, in your opinion? It yielded a 97% probability with a 50-hour inspection interval. While this probability may seem like the desired outcome, the question arises: when should one stop refining the analysis? The possibilities appear endless, so what criteria should be adopted? Unfortunately, my copy of Moubray's book is outdated, preventing me from referencing his insights on the matter. Nonetheless, I am determined to understand the underlying logic behind the formula soon. Simulation techniques offer the advantage of solving complex problems involving multiple variables more efficiently than traditional analytical methods, especially when these variables are interdependent. Regards, Rui.
I highly recommend searching for a copy of "Mathematical Principles in RCM" by Resnikov, as it provides valuable insight into the origins of this field. While it may be considered outdated, it is still worth sourcing a copy for its foundational knowledge.
Dear Daryl, thank you for the suggestion. I conducted a thorough search on the internet, including Amazon, but unfortunately, I could not find any information on Resnikov. It is disappointing as the title seemed intriguing. Do you have any other leads or suggestions? Regards, Rui.
Hello RUi, I apologize for not remembering the specific name of the location where I obtained the document. Unfortunately, I do not have access to my materials as I am currently out of the country. I am confident that someone else on this platform may have more information. The document was sourced from a place similar to the National Library, where the Nowlan and Heap report is also stored. I apologize for the lack of clarity.
Hey Rui, you can find the archived piece of work on Reliability-Centered Maintenance (RCM) on the National Technical Information Service (NTIS) website. The website requires a day-rate membership to access the content. Despite the cost, it is a valuable resource and a key reference in the field of RCM. Best of luck in your search!
I am grateful for your assistance, Daryl. I will make an effort to obtain the document. Thank you, Rui.
Regarding your quote mentioned earlier, the older edition of Moubray's book also contains the same formula. To find it, refer to the index under FFT. Unfortunately, I only have an older edition of the book, so I am not able to check what Moubray says about it. This formula is based on the concept of Fractional Dead Time, which can be found on pages 59 and 60 of The Reliability of Mechanical Systems by I MechE, with ISBN 0 85298 881 8, or in writings by Trevor Kletz (the specific reference is not available at the moment). This formula has been in existence for a considerable period (even before N&H, Resnikoff, Moubray, or JA 1011). It is an approximation with certain limitations on its applicability, as mentioned before. It should be noted that this formula relies on the use of Mean Values or Failure Rates, with a shape factor of '1' applying.
Thank you, Vee, for sharing your insights. I have a strong background in reliability and maintainability mathematics and am always eager to explore different perspectives from fellow authors. In relation to the application of the Weibull distribution to protective devices, I wanted to mention that the American Petroleum Institute is currently developing a Technical Module that utilizes a risk-based approach for assessing the criticality of pressure relief devices. This includes conventional and balanced bellows, pilot operated, and rupture disks, to establish inspection and testing frequencies. The module recommends specific parameters for the Weibull probability distribution in cases of failures on demand and leakage. For failures on demand, beta values typically range from 1.6 to 2.0, and for leakage, they range from 1.6 to 1.7. Meanwhile, alpha values are suggested to be between 3.5 to 50.5 years for failures on demand and 11 to 17.5 years for leakage. It is important to note that adjustment factors may need to be considered based on individual circumstances. Thus, my decision to use a Weibull distribution in my model instead of an Exponential distribution appears justified. I am pleased to learn that others also share this perspective. Regards, Rui.
Additionally, the document mentioned above also addresses various costs that are crucial in the industry. These include expenses related to injuries, unit shutdowns, environmental issues such as fines and costs linked to PRD leaks or equipment containment failures, production losses, and unit replacements. This once again highlights the significance of inspecting facilities from an economic perspective. Best regards, Rui.
Dear Rui, I want to express my admiration for your deep understanding of Reliability Engineering and Weibull analysis, as well as your commitment to challenging conventional thinking to uncover innovative solutions. I take on the role of devil's advocate in offering feedback on your recent posts, with the intention of providing constructive peer review. I have a few inquiries regarding the fascinating information you shared. The specifics you mentioned, such as the beta and alpha values for failures on demand and leakage, are intriguing. I am curious about the sources of these values. Were they derived from field trials or actual industrial data? Understanding how 'run lengths' for hidden functions are determined would greatly aid in conducting a Weibull analysis.
Additionally, in your second post, you referenced the 'cost of injury'. How are costs calculated for incidents like fatalities or permanent disabilities? Is this calculation methodology specific to the US, or is it universally applicable across different countries? Are there similar guidelines for assessing environmental damage? If you have any useful URLs to share on these topics, it would be greatly appreciated.
Please know that my questions are meant to enhance our mutual learning and discussion. I have great respect for your expertise and am grateful for the knowledge I gain from your posts.
Hey Rui, I had my assistant reach out to this website earlier this week to follow up on the document "Mathematical Aspects of Reliability-centered Maintenance." It seems like the document is priced at $79.50 USD, with an additional $12.50 for handling fees. I personally find it to be a valuable historical resource. Best of luck!
Hello Vee, I appreciate your interest in sharing your insights and expertise in inspections. I was intrigued by the example you mentioned regarding PRD reliability, whether they are considered to be in parallel or in series. Your contribution to Eugene's thread on "System Reliability calculating software?" was insightful. Regarding your queries, the document I mentioned earlier is currently a draft and is being discussed among members. It states that failure rates and leakage data are based on industry and committee member data, with the Center for Chemical Process Safety collecting data for pressure relief devices. Additionally, default Weibull parameters for pass/fail and leak curves are determined from bench test data. These values are suggested parameters for different fluid services and can be adjusted based on individual device performance in bench tests. A Bayesian approach is also being considered to adjust characteristic life for inspection confidence. As for the "cost of injury", while the document mentions it as a potential cost, it does not provide guidance on how to calculate it. It is worth noting that my friend, a risk analysis expert in the insurance industry, often deals with estimating the cost of injuries. According to him, everything comes with a price in the world of risk analysis, regardless of our personal feelings. Warm regards, Rui.
I want to express my gratitude to Daryl for providing valuable information. Your kindness is much appreciated. Regards, Rui.
- 28-11-2024
- Gregory Hughes
Hello Rui, I hope you are doing well. I completely agree with the Bayesian approach that you mentioned. Bristol University has conducted significant research in this field, so it would be beneficial to explore their work for relevant information.
In your recent post addressing the source data, you mentioned that Weibull parameters for pass/fail and leak curves were derived from bench test data analysis. These default parameters serve as suggestions and are critical for determining the time-to-failure data needed for Weibull analysis. The failure-to-lift of a Relief valve is considered a hidden function failure, making it challenging to pinpoint the exact time of failure. While bench-test data can indicate whether the relief valve is functioning or not, it does not provide the time of failure. The 'maximum-likelihood' method of Edwards can be used to estimate the expected value of time-to-failures, but this value may not align with the actual value. It is unclear how the API or CMA calculated their Weibull parameters. Additionally, the basis of these 'suggested' default values remains uncertain.
You mentioned considering a Bayesian approach to adjust the characteristic life of the relief valve to reflect inspection confidence. However, it is crucial to ensure that the original characteristic life is based on solid data and knowledge. The methodologies utilized by organizations like API and CMA must be transparent in how they gather time-to-failure data for hidden failures. It is essential to review if they have published their findings and provide a source reference for further validation.