top of page

More Adventures in Networking!


WARNING - Some tech-speak and jargon below. I’ve tried to be clear, but follow my more-or-less actual stream of consciousness, so the jumps and links might not always make sense. Also, as will become apparent, I am NOT an expert at troubleshooting networking issues. An expert would have known where to start, how to more quickly interpret the data, and would probably have figured this out and resolved it in about a half-hour. It took me a LOT longer than that and was rather stressful, but I really did learn a lot, so I guess I’m better off than when I started.


When I began this blog, one of my initial goals was simply to generate posts, so that I could get an idea of how the platform worked. There were samples provided, of course, but it just isn’t the same. I wanted “real” content, and was posting as I went along setting things up.


Then, after my “Adventures in Networking” posts (https://www.til-technology.com/my-blog/categories/ubiquiti), I started settling in a bit, and found that a weekly schedule worked reasonably well for me. Not always long or complex posts, but I found it a great motivation to continue learning (and generating content), and managed to stay more or less consistent for months.


And then last weekend happened.... (ominous music)


We were doing some work around the house, and needed to shut the power in one part of the house. Unfortunately, the label on the circuit-breaker was incorrect, and... I flipped the breaker on all the networking gear. Oops! I then flipped it back (and made sure I fixed the label on the circuit-breaker box, of course).


Not a problem, right? We’ve had other power interruptions, and the major impact was usually resetting the surge-protector switch. All back to normal.


Except it wasn’t. I couldn’t connect to the web on my main computer, and my son’s PS4 wasn’t able to connect either. Hm. Network issue? Let’s check out the UDM Pro (Ubiquiti Dream Machine), and – uh, oh.


I can connect, but can’t access the network dashboard. That is not good. I can get to the status page, though – what can I see there?


Well, I can see that there’s an update which has not been applied. Maybe an issue with a patch? Let’s try updating the UDM Pro, then. Click the button, and... nothing happens.


Ok. Let me check the ISP (Internet Service Provider) router. Experimentation demonstrates that it can reach the Web, but I’m getting an error message. Tried resetting it, then called the provider.


Unfortunately, the “tech support” analyst appeared to know less about troubleshooting issues like this than I did (and I am NOT an expert, here...). To simplify things, I tried plugging the UDM into a different Ethernet port (which made no difference), then unplugged it altogether. Still getting an error, so it seems to clearly be in the router or the cables. The analyst suggested changing the cables, but didn’t even know what kind of cables we were talking about (eventually looked up the difference between RJ11 and RJ45 cables...). Not the greatest tech support experience of my life, but it helped me figure out that the router was apparently working, so the error reported by the router was apparently separate (ie, something I can look into later. I’ve ordered replacement cables so that I can rule out the cables as the issue – if I do, then I suspect I’ll have to get a replacement router, as there’s not much else I’m allowed to do on that side.)

So, that was a red herring. Back to the internal network. I guess I better start doing some research. I found out how to connect directly to the UDM Pro, but didn’t have the patch I needed. That’s when I discovered that I DID have access to the web, but only via IP address, which suggested an issue with the Pihole. Let’s have a look there. Hm. The Raspberry Pi is up and running, but the Pihole DNS isn’t working. That suggests that if I can update the DNS resolver being used by the UDM Pro, I should be able to bypass the Pihole, address the immediate issue, and deal with the Pihole later.


As a side-note for possible future discussion, one vital element in troubleshooting is the ability to triage and prioritize. While I am not a technical expert in networking, I have quite a bit of experience in these areas, so it’s not hard for me to simply push aside an issue once I confirm it’s not the root cause of my current problem.


At any rate, I got DNS working on the Raspberry Pi, but the Pihole DNS is still not working, and I still can’t connect to the Ubiquiti dashboard. As expected, I discovered that the page was trying to connect to a Ubiquiti service and can’t resolve the DNS entry.

Sigh. Should I just do a factory reset on the UDM Pro and start fresh? “Nuke the site from orbit. Only way to be sure.” (https://www.youtube.com/watch?v=nnHmUk_J6xQ) Wait – let’s reach out to Ubiquiti support, first. They were pretty helpful, and gave me instructions on how to manually apply the firmware update.

So, download the patch, connect to the UDM Pro via SSH (https://en.wikipedia.org/wiki/Secure_Shell_Protocol). Check everything, double-check, take some screenshots, take some notes, and hit enter.

Wait – that doesn’t look right. The screen flashed what appeared to be some errors, then the SSH connection dropped. Try to reconnect... and find that SSH is now being refused by the device. Uh, oh...

Start reading through the documentation and notes again... and discover that I had run the wrong update command. (Missed a sub-heading)

PANIC!!!!!!


No. Deep breath. And another. Try logging into the dashboard – I can still get in, but SSH is now disabled and I can’t turn it back on because of the DNS issue. Facepalm.


Try rebooting from the dashboard. Nothing. Waited about 10 minutes, said “YOLO” (https://en.wikipedia.org/wiki/YOLO_(aphorism)), decided it was more dignified to use Latin, said “morituri te salutant!” (“https://en.wikipedia.org/wiki/Ave_Imperator,_morituri_te_salutant), and pulled the plug on the UDM Pro.

Fortunately, it eventually came back up and I was able to connect to it again via SSH, apply the patch, and successfully reboot it. Whew!

Went back to the dashboard, checked the network, and... still nothing.

Again, sigh. Progress made, at least. The UDM Pro is now patched and working – now I can look at the Pihole again. I had pinged it a number of times and confirmed that it was accessible but the DNS service wasn’t coming up. Now that the UDM Pro is patched, let’s try again. Still not working.

Let’s check the logs, and find an error message. No files in the log directory? NAS (Network Attached Storage) issue? Log directly into the NAS and have a look. Everything seems fine there - Services up, drives up, I can see the log files (last updated around the time of the power outage), and all the privileges seem to be in order. All green, apparently.

Something wrong with the connection from the Raspberry Pi to the NAS? I can get to the directory, but it appears empty. I’d have expected any issues to result in a directory error rather than an apparently empty directory, but sure. Why not? Let’s drop and recreate the mount points – Oh! That seems to have done something – I can see content now. Try to restart the Pihole DNS again? Yay!! Success!!

Let’s go back to the UDM Pro, and try the network again... Yay!! Success!!

Let’s look around a bit, and see whether everything is working. Hm. Strange – my main computer’s connection to the NAS seems to be gone as well... Drop and recreate, and everything seems to be back.

Whew!

But what the heck happened?

With 20/20 hindsight, the root cause of the network issues appears to have been the broken mount-points to the NAS, which “broke” the Pihole. The inability to log into the UDM Pro was apparently an effect, rather than a cause, and the UDM Pro patch was apparently unrelated, as was the ISP router issue (which I will add to my Kanban board).

Let’s be thorough, and list a few of the lessons-learned (to be fair, I already knew a few of these things, but this experience emphasizes their importance – ie, check this FIRST):

  1. If there are network issues, see if VPN connectivity works. If it does, it’s probably a DNS-related issue

  2. Best to make sure that you have defined backup DNS services, but I’m not sure that would have helped here. I think it would have helped if the Raspberry Pi were down and not responsive, but it was up and running, as was the Pihole. Issue was that the Pihole’s DNS service was down, which might not have actually caused a “fail-over”. Might be something to look into.

  3. Easiest approach to troubleshooting is often to disconnect and test components in isolation. Eg, unplug the UDM Pro from the ISP router to confirm that it’s not the issue – it took me longer than it should have to rule out the ISP router as the issue.

  4. Best to set up SSH access to networking devices in advance, in case the controller itself has problems. That would have saved quite a bit of time in this case.

  5. Look more seriously into a UPS (Uninterruptible Power Supply). That would dramatically reduce the risk of a power failure causing any sort of issue.

Those are enough for now. I learned other things as well, so it was an experience I’m glad to have had, though not one I’d be happy to repeat.


But I STILL don’t understand why the NAS connections failed... Always more to learn.


Cheers!

Comentários


bottom of page