A real gaming router

A real gaming router

We are playing GTA: Vice City on the TP-Link TL-WDR4900 wireless router.

What it is?

This is a TP-Link wireless router equipped with an external graphics processor AMD Radeon GPU. It connects via PCIe, runs Debian Linux, and you can play games on this router:

What is special about this router?

TP-LINK’s TL-WDR4900 v1

– This is a very interesting WiFi router. Instead of the typical MIPS or ARM CPUs found in conventional WiFi routers, the WDR4900 costs a CPU based

PowerPC

from NXP.

The NXP/Freescale QorIQ P1014 CPU used in the WDR4900 is a 32-bit PowerPC processor.
These CPUs provide a full 36-bit address space, and this CPU is high-performance (for a 2013 router) and features excellent PCIe controllers.

They quickly gained popularity in the OpenWrt and Freifunk communities, which is not surprising for a cheap router with such a high-performance CPU. 2.4 GHz and 5 GHz WiFi chipsets (manufactured by Qualcomm/Atheros) connect to the CPU via PCIe.

PCIe issues in embedded systems

PCIe cards are transparently mapped to host CPU memory space. p align=”justify”> The PCIe controller on the host CPU then sends all requests affecting a specific memory region to the specific PCIe device responsible for that memory region.

Multiple such mappings, also called “BARs” (base address assignment registers), can be provided on each PCIe card. The maximum size of such mappings varies from CPU to CPU.

Previously, even a regular CM4 for Raspberry Pi could allocate only 64 MiB of its address space for graphics cards.
Many other devices (such as MIPS-based router CPUs) are limited to only 32 MiB (or less).

In principle, all modern graphics cards require that the space of BAR addresses on the host system is at least 128 MiB – that much is needed for communication with the driver. Even relatively new cards, particularly Intel ARC, require “Resizable BAR” – a marketing term for very large 64-bit memory areas. Such cards allow you to map the entire VRAM (about 12+ GiBiB) to the host memory space.

Even with sufficient BAR space, PCIe device memory may not perform exactly like normal memory (for example, on an x86 CPU). This is why there were numerous problems when trying to connect the GPU to the Raspberry Pi.

Similar problems (regarding memory ordering/caching/nGnRE cards/alignment) occur even on large Arm64 server CPUs, and this requires you to hack the kernel and come up with workarounds such as:

We will equip the miniPCIe slot

In the factory configuration, the router does not provide connection with external devices via PCIe. To connect to the graphics card, a proprietary miniPCIe adapter circuit board was developed, which is connected to the router with a copper enameled wire:

The PCIe lanes leading from the CPU to one of the Atheros chipsets had to be cut and rerouted into a miniPCIe slot.

U-boot reports that PCIe2 is connected to an AMD Radeon HD 7470 graphics card:

U-Boot 2010.12-svn19826 (Apr 24 2013 - 20:01:21)

CPU:   P1014, Version: 1.0, (0x80f10110)
Core:  E500, Version: 5.1, (0x80212151)
Clock Configuration:
       CPU0:800  MHz,
       CCB:400  MHz,
       DDR:333.333 MHz (666.667 MT/s data rate) (Asynchronous), IFC:100  MHz
L1:    D-cache 32 kB enabled
       I-cache 32 kB enabled
Board: P1014RDB
SPI:   ready
DRAM:  128 MiB
L2:    256 KB enabled
Using default environment

PCIe1: Root Complex of mini PCIe Slot, x1, regs @ 0xffe0a000
  01:00.0     - 168c:abcd - Network controller
PCIe1: Bus 00 - 01
PCIe2: Root Complex of PCIe Slot, x1, regs @ 0xffe09000
  03:00.0     - 1002:6778 - Display controller
  03:00.1     - 1002:aa98 - Multimedia device
PCIe2: Bus 02 - 03
In:    serial
Out:   serial
Err:   serial
Net:   initialization for Atheros AR8327/AR8328
eTSEC1
auto update firmware: is_auto_upload_firmware = 0!
Autobooting in 1 seconds
=>

Installing Debian Linux

After installing OpenWrt on the router, we immediately get a kernel and user space, but the user space in OpenWrt turns out to be quite limited (busybox, musl libc, no libraries for graphics / games, etc.).

Also, when working with the default OpenWrt kernel, we lacked AMD graphics drivers. The driver problem was solved by compiling our own OpenWrt tree, in which we included additional modules. Then booted this kernel via TFTP directly from u-boot:

setenv ipaddr 10.42.100.4
tftpboot 0x2000000 10.42.100.60:wdr4900-nfs-openwrt.bin
bootm 0x2000000

Fortunately, the Debian Linux architecture provides for just this case

port “PowerPCSPE”

, Designed for this type of CPU (e500/e500v2). On a system with statically compiled QEMU binaries and with properly configured binfmt handlers, you can use the debootstrap tool from the Debian arsenal: it allows you to create a bootable userspace partition based on mirrors:

sudo QEMU_CPU=e500v2 debootstrap --exclude=usr-is-merged --arch=powerpcspe --keyring ~/gamingrouter/debian-ports-archive-keyring-removed.gpg unstable "$TARGET" https://snapshot.debian.org/archive/debian-ports/20190518T205337Z/

debootstrap via chroot will go to the new root file system and just execute the binaries

(post-install hooks, etc

). This work is clearly done by qemu-user-static, which is responsible for executing the PowerPCSPE binaries on this amd64 host machine. From the additional environment variable

QEMU_CPU=e500v2

QEMU will learn which CPU to emulate.

GPU amdgpu (modern AMD)

We performed the first experiments on an AMD Radeon RX570 GPU, using a modern graphics driver

amdgpu

. As a result, there were very strange artifacts and no (normal) image yet:

After a bit of debugging and finally installing 32-bit Linux x86 (i386) on another computer, we noticed that the same problem occurs on any other 32-bit platform, even regular Intel PCs. Obviously, in amdgpu there is some incompatibility with 32-bit platforms.

We have opened a discussion about this bug, but it is something that is not very actively analyzed yet.

GPU radeon (Legacy AMD)

But with an AMD Radeon HD 7470 card, which uses an older radeon driver, everything suddenly started working:

Problems with order from oldest to youngest

For this platform, we compiled reVC (a reverse engineering version of GTA Vice City, the source code of which is publicly available). To do this, you had to prepare your own premake, glfw3, glew and reVC builds as such.

root@gaming-router:/home/user/GTAVC# ./reVC
Segmentation fault

Oops 🙂

It is still necessary to work. It turns out that the game itself and the rendering engine (at least in the decompiled version) are not at all adapted to work from older to younger. When loading game resources, you have to load structures into memory (which contain eliminations, dimensions, numbers, coordinates, etc.). The data in these structures is numbered from youngest to oldest, and it ends up on a platform that works on the principle of oldest to youngest. Because of this, the game tries to access memory with absurd eliminations and crashes almost immediately.

We spent several days patching the game and the rendering engine librwin order for this code to work properly on machines numbered from oldest to youngest There were over 100 mechanisms in the source code that needed to be patched, the patches looked something like this:

@@ -118,6 +136,7 @@ RwTexDictionaryGtaStreamRead1(RwStream *stream)
  assert(size == 4);
  if(RwStreamRead(stream, &numTextures, size) != size)
    return nil;
+  numTextures = le32toh(numTextures);

  texDict = RwTexDictionaryCreate();
  if(texDict == nil)
@@ -458,8 +477,8 @@ CreateTxdImageForVideoCard()
          RwStreamWrite(img, buf, num);
        }

-       dirInfo.offset = pos / CDSTREAM_SECTOR_SIZE;
-       dirInfo.size = size;
+       dirInfo.offset = htole32(pos / CDSTREAM_SECTOR_SIZE);
+       dirInfo.size = htole32(size);
        strncpy(dirInfo.name, filename, sizeof(dirInfo.name));
        pDir->AddItem(dirInfo);
        CStreaming::RemoveTxd(i);

After the game had time to load some resources with the help of

RwStreamRead()

And the data loaded into the structures must be converted from the order of numbering from junior to senior to the numbering order accepted on the host.

For such operations as saving games, settings, etc., it is necessary to equip the reverse mechanism, which always ensured the preservation of the order from the youngest to the oldest.
Now we were able to download the game, explore the world, drive the car. But when trying to display the character, very strange graphic glitches appear.

Glitches with the player model

Attention: the picture is very wavy

The following video has a lot of glitter/glitter elements. If you have epilepsy or are prone to convulsions in response to light or other stimuli, please do not watch this video.

When all the main and secondary characters were disabled, there were no visible glimpses. Everything worked fine, the game was completely playable (as much as it can be played without secondary characters).

We spent a few more days looking for a bug in our code. Obviously, we made some kind of mistake by implementing the order support from oldest to youngest. All applicable variables, coordinates, vertices, transformations were dumped as numbers and compared to the version of the game where the order was from youngest to oldest.

Now everything looked completely normal, and we could not find any more problems.
In this state, the project froze for several months.

Wii U port

We managed to find another port for reVC online: the

Wii U

. Wii U uses CPU

IBM Espresso

, it’s a PowerPC-based processor like ours. It also works in order from oldest to youngest.

We contacted Gary, the author of this Wii U port, and very politely asked if we could take a look at the source code, patched from older to younger. Gary, thanks again!

By transplanting Gary’s patches into the regular reVC codebase (discarding all Wii U-specific changes), we were able to run reVC on the TP-Link using Gary’s well-studied patches.
And the same damage to the graphics as before began. But what is it?!

At this point, we looked everywhere for an answer, questioning and trying to verify that every part of the system was intelligently made: the kernel, the GPU drivers, the compilers, and the libraries.
PowerPC SPE is not the most common architecture (it was even removed from GCC 9), with very unusual floating-point number expansions (this architecture is very different from regular PowerPC CPUs).

Disabled spe (-mno-spe) and switched to floating-point programming model, switched to e500, e500v2 as target compilation platforms, etc. – nothing changed.

i386 test

To make sure the code wasn’t broken, we connected the same GPU to an x86 machine (a trusty ThinkPad T430 via ExpressCard 34). Installed the same version of Debian 10, the same libraries, the same radeon driver, the same firmware, and compiled the same reVC source code for the i386.

The game worked well, no graphical flaws were observed.

Modern LLVM kernel

At this stage we wanted to test the new kernel (with new radeon drivers). GCC stopped supporting PowerPC SPE, so compiling modern Linux 6.7 under GCC 8 will not work. But LLVM/clang just got PowerPC SPE support, Linux can be compiled with clang.

make LLVM=1 ARCH=powerpc OBJCOPY="~/binutils-2.42/build/binutils/objcopy" all -j 40 V=1
mkimage -C none -a 0x1200000 -e 0x1200000 -A powerpc -d arch/powerpc/boot/simpleImage.tl-wdr4900-v1 uImage12-nvme

We needed to provide our own version of binutils/objcopy (with PowerPC support) and ld.

Other changes that need to be made to the TP-Link WDR4900 with a kernel from the main branch turned out to be quite small:

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 968aee202..5ce3eeb09 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -181,6 +181,7 @@ src-plat-$(CONFIG_PPC_PSERIES) += pseries-head.S
 src-plat-$(CONFIG_PPC_POWERNV) += pseries-head.S
 src-plat-$(CONFIG_PPC_IBM_CELL_BLADE) += pseries-head.S
 src-plat-$(CONFIG_MVME7100) += motload-head.S mvme7100.c
+src-plat-$(CONFIG_TL_WDR4900_V1) += simpleboot.c fixed-head.S

 src-plat-$(CONFIG_PPC_MICROWATT) += fixed-head.S microwatt.c

@@ -351,7 +352,7 @@ image-$(CONFIG_TQM8548)                     += cuImage.tqm8548
 image-$(CONFIG_TQM8555)                        += cuImage.tqm8555
 image-$(CONFIG_TQM8560)                        += cuImage.tqm8560
 image-$(CONFIG_KSI8560)                        += cuImage.ksi8560
-
+image-$(CONFIG_TL_WDR4900_V1)          += simpleImage.tl-wdr4900-v1
 # Board ports in arch/powerpc/platform/86xx/Kconfig
 image-$(CONFIG_MVME7100)                += dtbImage.mvme7100

diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper
index 352d7de24..414216454 100755
--- a/arch/powerpc/boot/wrapper
+++ b/arch/powerpc/boot/wrapper
@@ -345,6 +345,11 @@ adder875-redboot)
     platformo="$object/fixed-head.o $object/redboot-8xx.o"
     binary=y
     ;;
+simpleboot-tl-wdr4900-v1)
+    platformo="$object/fixed-head.o $object/simpleboot.o"
+    link_address="0x1000000"
+    binary=y
+    ;;
 simpleboot-*)
     platformo="$object/fixed-head.o $object/simpleboot.o"
     binary=y
diff --git a/arch/powerpc/kernel/head_85xx.S b/arch/powerpc/kernel/head_85xx.S
index 39724ff5a..80da35f85 100644
--- a/arch/powerpc/kernel/head_85xx.S
+++ b/arch/powerpc/kernel/head_85xx.S
@@ -968,7 +968,7 @@ _GLOBAL(__setup_ehv_ivors)
 _GLOBAL(__giveup_spe)
        addi    r3,r3,THREAD            /* want THREAD of task */
        lwz     r5,PT_REGS(r3)
-       cmpi    0,r5,0
+       PPC_LCMPI       0,r5,0
        SAVE_32EVRS(0, r4, r3, THREAD_EVR0)
        evxor   evr6, evr6, evr6        /* clear out evr6 */
        evmwumiaa evr6, evr6, evr6      /* evr6 <- ACC = 0 * 0 + ACC */
diff --git a/arch/powerpc/platforms/85xx/Kconfig b/arch/powerpc/platforms/85xx/Kconfig
index 9315a3b69..86ba4b5e4 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -176,6 +176,18 @@ config STX_GP3
        select CPM2
        select DEFAULT_UIMAGE

+config TL_WDR4900_V1
+    bool "TP-Link TL-WDR4900 v1"
+    select DEFAULT_UIMAGE
+    select ARCH_REQUIRE_GPIOLIB
+    select GPIO_MPC8XXX
+    select SWIOTLB
+    help
+      This option enables support for the TP-Link TL-WDR4900 v1 board.
+
+      This board is a Concurrent Dual-Band wireless router with a
+      Freescale P1014 SoC.
+
 config TQM8540
        bool "TQ Components TQM8540"
        help
diff --git a/arch/powerpc/platforms/85xx/Makefile b/arch/powerpc/platforms/85xx/Makefile
index 43c34f26f..55268278d 100644
--- a/arch/powerpc/platforms/85xx/Makefile
+++ b/arch/powerpc/platforms/85xx/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_TWR_P102x)   += twr_p102x.o
 obj-$(CONFIG_CORENET_GENERIC)   += corenet_generic.o
 obj-$(CONFIG_FB_FSL_DIU)       += t1042rdb_diu.o
 obj-$(CONFIG_STX_GP3)    += stx_gp3.o
+obj-$(CONFIG_TL_WDR4900_V1) += tl_wdr4900_v1.o
 obj-$(CONFIG_TQM85xx)    += tqm85xx.o
 obj-$(CONFIG_PPA8548)     += ppa8548.o
 obj-$(CONFIG_SOCRATES)    += socrates.o socrates_fpga_pic.o
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index b2d8c0da2..21bc5f06b 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -272,7 +272,7 @@ config TARGET_CPU
        default "e300c2" if E300C2_CPU
        default "e300c3" if E300C3_CPU
        default "G4" if G4_CPU
-       default "8540" if E500_CPU
+       default "8548" if E500_CPU
        default "e500mc" if E500MC_CPU
        default "powerpc" if POWERPC_CPU

The result was a bootable kernel. No graphical defects appeared again. It turned out that it is very nice to completely get rid of the OpenWrt toolkit.

qemu-user-static using llvmpipe

To simplify debugging, we copied the root filesystem to the local machine with amd64 (using qemu-user-static again) and configured the X server to work with a formal/virtual monitor. Then connected to the x11vnc system to be able to watch this formal monitor.

Section "Device"
    Identifier  "Configured Video Device"
    Driver      "dummy"
    VideoRam    256000
EndSection

Section "Monitor"
    Identifier  "Configured Monitor"
    HorizSync   60.0 - 1000.0
    VertRefresh 60.0 - 200.0
    ModeLine    "640x480"   23.75  640 664 720 800  480 483 487 500 -hsync +vsync
              # "1920x1080" 148.50 1920 2448 2492 2640 1080 1084 1089 1125 +Hsync +Vsync
EndSection

Section "Screen"
    Identifier  "Default Screen"
    Monitor     "Configured Monitor"
    Device      "Configured Video Device"
    DefaultDepth 24
    SubSection "Display"
        Depth 24
        Modes "640x480"
    EndSubSection
EndSection

Inside the chroot (at

QEMU_CPU

established for

e500v2

), we started Xorg, x11vnc and finally reVC:

export LIBGL_ALWAYS_SOFTWARE=true
export GALLIUM_DRIVER=llvmpipe
export DISPLAY=:2

Xorg -config /etc/xorg.conf :2 &
x11vnc -display :2 &
xrandr --output default --mode "800x600"
/home/user/GTAVC/reVC

… despite the fact that this mechanism works

absurd

slowly (1 frame about ~20 s), everything worked. It even worked with player models, without any graphical glitches. The main differences were as follows:

  • QEMU emulates a CPU rather than real hardware;
  • llvmpipe instead of radeon/r600.

Then installed

GALLIUM_DRIVER=llvmpipe

on real iron. This caused the performance to degrade even more (about 1 fps!) but everything worked!

There were no noticeable graphic defects

(although I had to wait almost an hour to dive into the game…).

Mesa update

Then we set about updating the mesa on the router. This also required updating a number of dependencies. cmake, libglvnd, meson, drm and, finally, mesa had to be compiled from scratch, the code was taken either directly from git or from the latest release.

After installing the new libglvnd, drm and mesa, the display of characters worked normally on the real iron (with acceleration!). We still haven’t discovered the real cause of the problem (and which library it is), but we were more than happy with how we managed to solve this problem.

Result


You might also want to read this:

News, product reviews and contests from the Timeweb.Cloud team are in our Telegram channel

Related posts