Critical Network Cable Fault: Understanding the Impact on CPX and HMI Connectivity

22-01-2025
Maxkling
19 comments
7349
446

Question:

Our network setup, while straightforward, is extensive and includes a CPX L32e and a Magelis system connected to an unmanaged switch. This configuration is repeated approximately 30 times with N-Tron managed switches arranged in a ring topology. One critical issue we face is that the cable connecting the unmanaged switch to the N-Tron is highly vulnerable to pinching and damage. Currently, our network functions primarily as a supervisory system, with only the Magelis and CPX requiring connectivity to perform their roles. The main challenge arises when the cable becomes pinched, causing a short circuit that can potentially take down the entire network. In this event, all CPXs and HMIs lose their CIP connection. I have successfully replicated this issue in a controlled environment and utilized Wireshark for network analysis. When the wire shorts, it effectively loops the transmitter (Tx) and receiver (Rx) for both 10/100 networks, resulting in a ghost device scenario. Communication analysis between the PLC and HMI reveals that the CIP connection fails when the PLC performs an ARP request for the address "0.0.0.0". Following this failure, the TCP connection will not re-establish until the faulty wire is disconnected. I have even tested the unmanaged switch in isolation with the PLC and HMI, and encountered the same issue. I plan to upload network capture data tomorrow for further insights. My suspicion is that the ARP table may become corrupted due to the wire fault, though I am still trying to substantiate this theory. Notably, any location in the network where a wire fault is created prompts all CPXs to send an ARP request for "0.0.0.0", leading to communication failure until the faulty wire is removed. Under normal circumstances, this situation does not arise. Additionally, I have observed that packet counts and overall network traffic remain low and stable even when the wire is compromised, indicating no packet storm or loopback is occurring. Tomorrow, I'll share diagrams and capture data to further illustrate the problem.

Top Replies

The current cable routing design, resembling a festoon style, poses a risk to wires due to its subpar layout. While the majority of the wires are robust, heavy-duty SOOW cables, there is a fragile Cat5 cable that often becomes tangled and damaged. Although we are exploring alternative routing solutions, we are currently constrained by this existing setup. Our primary goal is to enhance network stability, especially when problems occur. I understand that connecting my faulty cable to the internet won't disrupt global connectivity, which highlights the need for effective isolation strategies and issue management protocols.

23-01-2025
Maxkling

Which specific model of N-Tron are you referring to, and what particular model of unmanaged switch do you need information on?

24-01-2025
Ken Roach

I'm currently utilizing standard Netgear unmanaged switches, and I'll provide the model number for the N-Tron 708FX by tomorrow.

24-01-2025
Maxkling

Since your switch operates without management capabilities, you're likely only observing broadcast packets. The message "ARP for address 0.0.0.0" refers to an ARP Probe, which is a standard duplicate address verification packet generated when a device first connects to the network. Are you indicating that when a short circuit occurs on one of your drop lines, all ControlLogix systems and HMIs linked to other unmanaged switches within the network experience connection failures—not just the one with the damaged cable? This scenario strongly suggests the presence of a Layer 2 loop. Interesting.

24-01-2025
Ken Roach

Maxkling highlighted that issues occur when the wire gets pinched, leading to a short circuit. To address this, I would prioritize resolving this problem first.

24-01-2025
janner_10

More Replies →

User janner_10 commented: "This is my top priority for improvement." Click here to read more... I anticipated that response. While we are indeed addressing the issue, we find ourselves hindered by the current setup. Regardless, a network should remain resilient and operational, even if there’s a faulty pinched wire, particularly when it serves merely as a connection point for a device, rather than being part of the core infrastructure.

25-01-2025
Maxkling

Here’s a simplified overview of my current network setup, which I am using for bench testing purposes. The production network mirrors this setup but includes over 30 additional nodes. To achieve a truly comparable network environment, I plan to replace my existing Netgear unmanaged switches with Harting Ha-VIS eCon2050B-A 5-port unmanaged switches, the ones we employ in production. I will also begin capturing data with Wireshark to analyze traffic patterns. I appreciate the insightful comments received so far. Regarding the ARP request issue, it's interesting to note that the anomalous requests don't occur immediately when the shorted cable is connected. Instead, there can be a delay of several minutes before they start, and they tend to recur every few minutes. Importantly, the source of these ARP requests appears to be the CPX device. By addressing these challenges, I aim to enhance our network’s performance and reliability.

25-01-2025
Maxkling

Explore our selection of high-flex industrial Ethernet cables designed for reliable performance in demanding environments. Our products, available at leading manufacturers like L-com and Belden, ensure robust connectivity and durability for industrial applications. Whether you need cables for harsh conditions or standard setups, our range of industrial Ethernet cables delivers superior flexibility and resilience. Discover the best solutions for your industrial networking needs today!

25-01-2025
jimtech67

I believe the issue at hand is that the PLC is detecting an ARP request for the IP address 0.0.0.0, which is being mirrored back to itself. This situation arises because the Rx/Tx wires are shorted, likely causing confusion for the unmanaged switch. The PLC is responding to this request using its own MAC address.

26-01-2025
Dravik

I have uploaded the Wireshark capture files to Dropbox for your review: [Download Wireshark Network Test](https://www.dropbox.com/s/2k66htbsk1o8dgc/wireshark_network_test_5.9.18.zip?dl=0). In the latest network diagram, I have added an additional N-Tron for clarity. The two standard capture files serve as a benchmark to demonstrate that there is no increase in network traffic, indicating that neither a storm nor a loopback condition is present. ### Analysis of "fault_at_unman_plc_snoop" In this setup, I have connected my monitoring device (denoted as "PLC Snoop" in the diagram) between the PLC and the unmanaged switch. I then connected my faulty wire to the unmanaged switch. Notably, at approximately the 59.91-second mark, you will observe PLC "Rockwell_62:1c:78" broadcasting an ARP request: "Who has 10.5.32.12? Tell 0.0.0.0." This broadcast appears to disrupt the TCP session, which does not resume until the faulty wire is disconnected around the 142-second mark. ### Analysis of "fault_at_ntron34_plc_snoop" For this configuration, I have also tapped between the PLC and the unmanaged switch (labeled as "PLC Snoop" in the diagram). I connected the faulty wire to N-Tron 34, specifically at the location marked "Wire Fault 2." At approximately the 204-second mark in the capture, you will notice that the TCP session drops. This situation clearly indicates a problem related to ARP or MAC address resolution. As the next step, I plan to eliminate the unmanaged switches from the network setup and connect the PLC and HMI directly to the N-Trons to troubleshoot further.

27-01-2025
Maxkling

At first glance, I'm not convinced that Spanning Tree will fully resolve this issue. On Cisco devices, I recommend activating LoopGuard and/or UDLD on the copper ports as a temporary solution; however, this won't address the challenges posed by unmanaged devices. **Note:** I initially cited an incorrect command. If you're looking for a more permanent fix or additional ideas to enhance your network's reliability, consider exploring alternative strategies to manage potential loops effectively.

27-01-2025
Dravik

After bypassing the unmanaged switches and connecting the PLC and HMIs directly to the N-Tron devices, we are still experiencing connectivity problems. I plan to conduct further testing using a Stratix switch along with a NAT router to identify and resolve the issues.

27-01-2025
Maxkling

Thank you for sharing those Wireshark captures and diagrams! I have some doubts about whether the ARP 0.0.0.0 is actually causing the connection drop or if it is simply a symptom of a broader issue. It’s noteworthy that the low-level Ethernet packets associated with the N-Tron ring protocol (marked as Red Lion) continue to flow rapidly even after the disruption. I'm curious about the diagnostics that the 708TX is recording during this event. Could you clarify what type of tap you are using between the PLC and the switch? While I observe multiple Magelis HMI requests being sent to the PLC, I don’t see any corresponding replies from the PLC, which leads me to believe that we might not be capturing the complete data in Wireshark.

27-01-2025
Ken Roach

Hello Ken, I'll investigate the available fault logs to see what insights they offer. I connected a Stratix switch along with a PLC and HMI, even with an issue related to a faulty wire, and surprisingly, everything is still functioning well. As time is running short for further testing, I plan to conduct more tests tomorrow to ensure everything is thoroughly evaluated.

27-01-2025
Maxkling

It is essential to address wire faults promptly, as they can lead to a host of other communication issues stemming from inadequate cable management. I strongly recommend implementing managed switches in your network infrastructure to enhance port diagnostics. Managed switches allow you to monitor Ethernet errors on specific ports, including CRC and FCS errors, along with other critical data. This valuable information will assist in diagnosing network issues based on error type and severity, creating a benchmark for improvement. Having experienced similar challenges in input/output networks with up to 30 nodes, I can attest to the significance of adhering to best practices in Ethernet cabling. It's vital to respect all technical specifications, including bend radius, pull tension, and the required distance from electromagnetic interference (EMI) sources, as well as ensuring proper connector installation. I highly recommend using shielded Ethernet cables for their added protection. A poorly designed or installed network can significantly disrupt operations and lead to frequent issues, costing you time and resources. Prioritizing a robust network infrastructure is key to maintaining efficient communication and preventing recurring problems.

27-01-2025
iraiam

Ken Roach expressed appreciation for the Wireshark captures and diagrams provided! However, he remains skeptical that the ARP entry of 0.0.0.0 is the root cause of the connection drop; he believes it may just be a symptom of a larger issue. Notably, the low-level Ethernet packets associated with the N-Tron ring protocol (identified as Red Lion) continue to flow swiftly even after the interruption. He questions what specific diagnostics the 708TX device is recording during this event. Additionally, he inquires about the type of tap being used between the PLC and the switch. Although all Magelis HMI requests directed to the PLC can be observed, the PLC’s responses to those requests are absent, indicating that Wireshark may not be capturing all relevant traffic. To enhance the data captured in Wireshark, it's crucial to utilize a managed switch. Typically, port cloning is required for effective logging. For instance, if the Wireshark computer is connected to port 6 of the managed switch and you want to monitor communications on ports 1 and 2, it is essential to clone ports 1 and 2 to port 6. This way, any packets transmitted or received on ports 1 and 2 will also be sent to port 6, allowing Wireshark to capture a comprehensive view of the network activity.

27-01-2025
iraiam

If you're encountering challenges with festoon wireway systems, it might be time to consider upgrading to a more robust Ethernet cable. For instance, you could explore options like the Allen Bradley 1585 cable or the Belden MarineTuff Offshore and Marine cable. Notably, the Belden cable offers the added benefit of a bronze braid armor for extra durability. My experience in the oil and gas sector, particularly in servicing onshore drilling rigs, has shown that marine-grade cables typically perform exceptionally well in demanding environments.

27-01-2025
mhammer214

Ken Roach explained that if you’re working with an unmanaged switch, you are likely only observing broadcast packets. The message “ARP for address 0.0.0.0” refers to an ARP Probe, a standard duplicate address check packet triggered when a device initially joins the network. Are you indicating that when a short circuit occurs on one of your “droplines,” the connections of all ControlLogix systems and HMIs linked to other unmanaged switches on the same network simultaneously fail, rather than just the device with the damaged cable? This scenario appears to point towards a Layer 2 loop issue. Interesting! I had the same impression initially.

27-01-2025
sparkie

To effectively analyze network traffic using Wireshark logs, it's essential to utilize a managed switch. This is crucial because, without it, Wireshark will only capture broadcast traffic unless ports are cloned to the port where the Wireshark system is connected. For instance, if you want to monitor all communications associated with devices on ports 1 and 2, connect your Wireshark computer to port 6 on a managed switch and configure ports 1 and 2 to be mirrored to port 6. This setup ensures that every packet transmitted or received on ports 1 and 2 is directed to port 6 for comprehensive logging by Wireshark. In my case, I implemented port mirroring using N-Tron switches. While I do have access to managed switches, the setup often involves connecting the PLC and HMI back to the managed switch. To capture network traffic from an unmanaged switch, I rely on a passive tap to intercept the packets effectively. It’s important to note that the Stratix switch recognizes wire faults as a loopback condition and will block the port. However, this only occurs when the faulty wire is directly connected to the Stratix switch; if the issue arises elsewhere in the network, there won't be any port blocking. By understanding these concepts and implementing port mirroring with a managed switch, network administrators can gain valuable insights from their Wireshark analyses.

27-01-2025
Maxkling

Streamline Your Asset Management
See How Oxmaint Works!!

✅ Work Order Management

✅ Asset Tracking

✅ Preventive Maintenance

✅ Inspection Report

We have received your information. We will share Schedule Demo details on your Mail Id.

You must be a registered user to add a comment. If you've already registered,
sign in. Otherwise, register and sign in.

Frequently Asked Questions (FAQ)

FAQ: 1. What is the main issue with the current network setup involving CPX and HMI systems?

Answer: - The main issue is that the cable connecting the unmanaged switch to the N-Tron managed switch is prone to pinching and damage, which can create a short circuit. This leads to a network failure where all CPXs and HMIs lose their CIP connection, causing significant communication issues.

FAQ: 2. How does a cable fault impact the network connectivity for CPX and HMI devices?

Answer: - When the cable becomes pinched and creates a short circuit, it can loop the transmitter (Tx) and receiver (Rx) lines, resulting in a "ghost device" scenario. This disrupts the CIP connection, causing the PLC to perform ARP requests for an invalid address ("0.0.0.0"), and the TCP connection fails to re-establish until the faulty wire is disconnected.

FAQ: 3. What diagnostic tools have been used to analyze the network issue, and what were the findings?

Answer: - Wireshark has been used for network analysis. The findings indicate that when the wire shorts, there is no packet storm or loopback occurring. However, the ARP table may become corrupted, leading to communication failures. Packet counts and overall network traffic remain low despite the fault.

Ready to Simplify Maintenance?

Join hundreds of satisfied customers who have transformed their maintenance processes.
Sign up today and start optimizing your workflow.

Request Demo →