The other day, my EdgeRouter Lite completely died. It unfortunately turned out to be a non-trivial fix. I want to share what happened and how I fixed it.

The Problem

I wanted to make a simple change to my router configuration. When I tried to SSH into the box, my session would authenticate successfully but the connection was immediately closed.

I thought this was a strange behavior, so I decided to check the GUI. I logged on to the admin console and tried to access the shell via the GUI. Same result. The window would open, pop out a few normal messages, and close the connection immediately. This time it just displayed a big Reconnect button over the terminal. Not good.

Lastly, I tried to make a change to the configuration using the GUI controls. Regardless of the type of change, it would fail regardlessly with a generic message. So I was hosed. If you’re in this state, try to download a backup of your config, because there’s rough waters ahead. Unfortunately for me, I wasn’t even able to download the config. It errored too.

I thought maybe it just needed a good ol’ reboot. I had over 3 months of uptime on this thing (which isn’t alot for a router). I power-cycled the router without too much thought. It appeared to come back alive, but I have a bigger problem now. I can no longer ping the router’s IP. It’s just gone. I tried doing some direct connections by unplugging things, setting a static 192.168.1.x IP and connecting to port eth0. Nothing. It was broken.

Patience

You know what sucks? Trying to debug problems when you are flying blind. I was crippled in two ways: one, I now had no internet access and two, I had no eyes into the boot process of the router. To complicate matters, I had not backed up my router config! What a mess.

If you a local electronics store nearby or more patience than me, get a serial cable and save yourself some trouble. This cable will give you a console so you can see why your boot was failing. Unfortunately, I did not have that luxury. Onward!

Before trying any of the factory reset options, I would recommend cracking open your router and backing up your config. Even if you have a corrupt file system or disk failure on the router, you may still be able to recover your config.

Backup

Grab a tiny screwdriver and unscrew the three screws on the back of your EdgeRouter. Inside, you should see an on-board USB port connected to a silver 4 GB USB drive. That’s the router’s storage device. Unplug the USB drive and plug it in to your computer.

Once you have the USB drive connected, you can backup your data in a number of ways. I chose to image the entire device via dd:

lsblk # find which device your drive is defined to
dd if=/dev/sdc of=/home/aus/backup/edge.img

I then mounted the partitions on the image file to loopback and was able to browse all my files on my USB drive. (If you chosoe to image the entire disk, instead of each partition, this post will be useful to you.)

Luckily, I was able to recover the entire filesystem.

Next, I ran a fsck on both USB partitions. It corrected several errors. I plugged my USB back in and hoped that it would fix the booting problem. However, it did not. It may be worth a shot for you.

Recovery

Now that I had a good back up of my router config. It was time to start exploring recovery options. The first two recommended methods failed. Pressing the reset button would kick off the LED blinking process. It seemed to have reset. But when I rebooted the router, I still could not ping 192.168.1.1. The last resort is to rebuild the USB image entirely.

Ubiquiti recommends a series of complicated steps to rebuild the USB disk. It requires setting up a TFTP server and using the serial cable to kick off the boot process. This method should be a lot simpler:

Step 0:

You know how I don’t have Internet access now? We need to solve that. I’m pretty much worthless without a Google search at the helm. I could have unplugged and moved some stuff around in my homelab, but I decided the simpler solution was to just tether out with my iPhone.

Again, I’d highly recommend getting a RJ45 serial cable. This will give you eyes into the boot process. Unfortunately, it was late at night and I would have access to buy one until the morning. If you have one, great. If you can get one, good. If you can’t, just keep reading.

Step 1:

First, we need to get access to the USB drive. If you think your USB drive is failing, then you need to replace it. You’ll also need a USB drive that is low profile enough to fit in the USB slot. Any reliable 4+ GB USB drive will do. Many people have had good results with this SanDisk Cruzer Fit 8GB USB 2.0 Low-Profile Flash Drive.

Insert the new or existing USB drive into a Linux box.

Step 2:

We are going to use the EdgeMax Resuce Kit scripts to rebuild our new USB drive. Clone this repository.

git clone https://github.com/vyos/emrk.git

Step 3:

Before we can run the script, we need to make a few edits. If you’re not 100% comfortable with each command in this script, I would recommend running them one at a time, by hand.

Edit bin/emrk-reinstall with the following changes:

Make sure you get the script right! If you don’t, you could loose data. Now, create some temporary directories to use as mount points:

mkdir /tmp/boot
mkdir /tmp/root

Step 4:

Triple check that you have the USB device set correctly in your emrk-reinstall script. Then run the script. If all goes well, it should format your USB drive and create the proper partitions. It then will prompt you for the URL to your previously existing EdgeOS version tar file. Paste that URL in to continue.

Step 5:

If your script completed successfully, remove the USB drive and put it back into your EdgeRouter. Power on the router, connect a Ethernet cable to port eth0 and your NIC. Set a static IP to 192.168.1.x on your PC and try to ping 192.168.1.1. Moment of truth…

Restore

If you can ping the router and logon to the Web UI, congratulations. If you can’t, I recommend getting a serial cable and seeing what’s up. Now we need to restore our old configuration. There’s multiple ways to do this.

For me, I just updated my config.json with my old one and rebooted. After that, everything was back to normal!

comments powered by Disqus