User janner_10 commented: "This is my top priority for improvement." Click here to read more... I anticipated that response. While we are indeed addressing the issue, we find ourselves hindered by the current setup. Regardless, a network should remain resilient and operational, even if there’s a faulty pinched wire, particularly when it serves merely as a connection point for a device, rather than being part of the core infrastructure.
Here’s a simplified overview of my current network setup, which I am using for bench testing purposes. The production network mirrors this setup but includes over 30 additional nodes. To achieve a truly comparable network environment, I plan to replace my existing Netgear unmanaged switches with Harting Ha-VIS eCon2050B-A 5-port unmanaged switches, the ones we employ in production. I will also begin capturing data with Wireshark to analyze traffic patterns. I appreciate the insightful comments received so far.
Regarding the ARP request issue, it's interesting to note that the anomalous requests don't occur immediately when the shorted cable is connected. Instead, there can be a delay of several minutes before they start, and they tend to recur every few minutes. Importantly, the source of these ARP requests appears to be the CPX device.
By addressing these challenges, I aim to enhance our network’s performance and reliability.
Explore our selection of high-flex industrial Ethernet cables designed for reliable performance in demanding environments. Our products, available at leading manufacturers like L-com and Belden, ensure robust connectivity and durability for industrial applications. Whether you need cables for harsh conditions or standard setups, our range of industrial Ethernet cables delivers superior flexibility and resilience. Discover the best solutions for your industrial networking needs today!
I believe the issue at hand is that the PLC is detecting an ARP request for the IP address 0.0.0.0, which is being mirrored back to itself. This situation arises because the Rx/Tx wires are shorted, likely causing confusion for the unmanaged switch. The PLC is responding to this request using its own MAC address.
I have uploaded the Wireshark capture files to Dropbox for your review: [Download Wireshark Network Test](https://www.dropbox.com/s/2k66htbsk1o8dgc/wireshark_network_test_5.9.18.zip?dl=0).
In the latest network diagram, I have added an additional N-Tron for clarity. The two standard capture files serve as a benchmark to demonstrate that there is no increase in network traffic, indicating that neither a storm nor a loopback condition is present.
### Analysis of "fault_at_unman_plc_snoop"
In this setup, I have connected my monitoring device (denoted as "PLC Snoop" in the diagram) between the PLC and the unmanaged switch. I then connected my faulty wire to the unmanaged switch. Notably, at approximately the 59.91-second mark, you will observe PLC "Rockwell_62:1c:78" broadcasting an ARP request: "Who has 10.5.32.12? Tell 0.0.0.0." This broadcast appears to disrupt the TCP session, which does not resume until the faulty wire is disconnected around the 142-second mark.
### Analysis of "fault_at_ntron34_plc_snoop"
For this configuration, I have also tapped between the PLC and the unmanaged switch (labeled as "PLC Snoop" in the diagram). I connected the faulty wire to N-Tron 34, specifically at the location marked "Wire Fault 2." At approximately the 204-second mark in the capture, you will notice that the TCP session drops.
This situation clearly indicates a problem related to ARP or MAC address resolution. As the next step, I plan to eliminate the unmanaged switches from the network setup and connect the PLC and HMI directly to the N-Trons to troubleshoot further.
At first glance, I'm not convinced that Spanning Tree will fully resolve this issue. On Cisco devices, I recommend activating LoopGuard and/or UDLD on the copper ports as a temporary solution; however, this won't address the challenges posed by unmanaged devices. **Note:** I initially cited an incorrect command.
If you're looking for a more permanent fix or additional ideas to enhance your network's reliability, consider exploring alternative strategies to manage potential loops effectively.
After bypassing the unmanaged switches and connecting the PLC and HMIs directly to the N-Tron devices, we are still experiencing connectivity problems. I plan to conduct further testing using a Stratix switch along with a NAT router to identify and resolve the issues.
Thank you for sharing those Wireshark captures and diagrams! I have some doubts about whether the ARP 0.0.0.0 is actually causing the connection drop or if it is simply a symptom of a broader issue. It’s noteworthy that the low-level Ethernet packets associated with the N-Tron ring protocol (marked as Red Lion) continue to flow rapidly even after the disruption. I'm curious about the diagnostics that the 708TX is recording during this event. Could you clarify what type of tap you are using between the PLC and the switch? While I observe multiple Magelis HMI requests being sent to the PLC, I don’t see any corresponding replies from the PLC, which leads me to believe that we might not be capturing the complete data in Wireshark.
Hello Ken, I'll investigate the available fault logs to see what insights they offer. I connected a Stratix switch along with a PLC and HMI, even with an issue related to a faulty wire, and surprisingly, everything is still functioning well. As time is running short for further testing, I plan to conduct more tests tomorrow to ensure everything is thoroughly evaluated.
It is essential to address wire faults promptly, as they can lead to a host of other communication issues stemming from inadequate cable management. I strongly recommend implementing managed switches in your network infrastructure to enhance port diagnostics. Managed switches allow you to monitor Ethernet errors on specific ports, including CRC and FCS errors, along with other critical data. This valuable information will assist in diagnosing network issues based on error type and severity, creating a benchmark for improvement.
Having experienced similar challenges in input/output networks with up to 30 nodes, I can attest to the significance of adhering to best practices in Ethernet cabling. It's vital to respect all technical specifications, including bend radius, pull tension, and the required distance from electromagnetic interference (EMI) sources, as well as ensuring proper connector installation. I highly recommend using shielded Ethernet cables for their added protection. A poorly designed or installed network can significantly disrupt operations and lead to frequent issues, costing you time and resources. Prioritizing a robust network infrastructure is key to maintaining efficient communication and preventing recurring problems.
Ken Roach expressed appreciation for the Wireshark captures and diagrams provided! However, he remains skeptical that the ARP entry of 0.0.0.0 is the root cause of the connection drop; he believes it may just be a symptom of a larger issue. Notably, the low-level Ethernet packets associated with the N-Tron ring protocol (identified as Red Lion) continue to flow swiftly even after the interruption. He questions what specific diagnostics the 708TX device is recording during this event.
Additionally, he inquires about the type of tap being used between the PLC and the switch. Although all Magelis HMI requests directed to the PLC can be observed, the PLC’s responses to those requests are absent, indicating that Wireshark may not be capturing all relevant traffic.
To enhance the data captured in Wireshark, it's crucial to utilize a managed switch. Typically, port cloning is required for effective logging. For instance, if the Wireshark computer is connected to port 6 of the managed switch and you want to monitor communications on ports 1 and 2, it is essential to clone ports 1 and 2 to port 6. This way, any packets transmitted or received on ports 1 and 2 will also be sent to port 6, allowing Wireshark to capture a comprehensive view of the network activity.
If you're encountering challenges with festoon wireway systems, it might be time to consider upgrading to a more robust Ethernet cable. For instance, you could explore options like the Allen Bradley 1585 cable or the Belden MarineTuff Offshore and Marine cable. Notably, the Belden cable offers the added benefit of a bronze braid armor for extra durability. My experience in the oil and gas sector, particularly in servicing onshore drilling rigs, has shown that marine-grade cables typically perform exceptionally well in demanding environments.
Ken Roach explained that if you’re working with an unmanaged switch, you are likely only observing broadcast packets. The message “ARP for address 0.0.0.0” refers to an ARP Probe, a standard duplicate address check packet triggered when a device initially joins the network. Are you indicating that when a short circuit occurs on one of your “droplines,” the connections of all ControlLogix systems and HMIs linked to other unmanaged switches on the same network simultaneously fail, rather than just the device with the damaged cable? This scenario appears to point towards a Layer 2 loop issue. Interesting! I had the same impression initially.
To effectively analyze network traffic using Wireshark logs, it's essential to utilize a managed switch. This is crucial because, without it, Wireshark will only capture broadcast traffic unless ports are cloned to the port where the Wireshark system is connected. For instance, if you want to monitor all communications associated with devices on ports 1 and 2, connect your Wireshark computer to port 6 on a managed switch and configure ports 1 and 2 to be mirrored to port 6. This setup ensures that every packet transmitted or received on ports 1 and 2 is directed to port 6 for comprehensive logging by Wireshark.
In my case, I implemented port mirroring using N-Tron switches. While I do have access to managed switches, the setup often involves connecting the PLC and HMI back to the managed switch. To capture network traffic from an unmanaged switch, I rely on a passive tap to intercept the packets effectively.
It’s important to note that the Stratix switch recognizes wire faults as a loopback condition and will block the port. However, this only occurs when the faulty wire is directly connected to the Stratix switch; if the issue arises elsewhere in the network, there won't be any port blocking.
By understanding these concepts and implementing port mirroring with a managed switch, network administrators can gain valuable insights from their Wireshark analyses.