Ayke van Laethem

Implementing parallelism

How we added threading and multicore support to TinyGo

@ayke@hachyderm.io
@aykevl
@aykevl

Concurrency != parallelism

Why now in TinyGo?

  • Linux
  • WebAssembly
  • Baremetal (RP2040)

1:1 threading?

Source:
https://eli.thegreenplace.net/2018/measuring-context-switching-and-memory-overheads-for-linux-threads/

What needs to change?

  • scheduler
  • garbage collector
  • chan, select
  • package sync
  • package sync/atomic
  • misc: println, runtime.NumCPU, etc

Futex!

What even is a futex?

Futex API

  • wait(address *atomic.Uint32, expected uint32)
  • wakeOne(address *atomic.Uint32)
  • wakeAll(address *atomic.Uint32)

						// In the kernel:
						var waitingThreads = make(map[*atomic.Uint32][]*OSThread)

						func wait(address *atomic.Uint32, expected uint32) {
							// do atomically:
							if address.Load() == expected {
								waitingThreads[address] = append(waitingThreads[address], currentThread())
								// and now wait
							}
						}
					
Platform API
Linux see futex(2)
MacOS __ulock_wait2
__ulock_wake
Windows WaitOnAddress
WakeByAddressSingle
WakeByAddressAll
WebAssembly memory.atomic.wait
memory.atomic.notify

More information:
https://outerproduct.net/futex-dictionary.html

Wrapped futex (TinyGo)


						type Futex struct {
							atomic.Uint32
						}

						func (f *Futex) Wait(expected uint32) {
							wait(&f.Uint32, expected)
						}

						func (f *Futex) Wake() {
							wakeOne(&f.Uint32)
						}

						func (f *Futex) WakeAll() {
							wakeAll(&f.Uint32)
						}
					

sync.WaitGroup


						type WaitGroup struct {
							futex task.Futex // wrapped atomic.Uint32
						}

						func (wg *WaitGroup) Add(delta int) {
							if wg.futex.Add(uint32(delta)) == 0 {
								wg.futex.WakeAll()
							}
						}

						func (wg *WaitGroup) Wait() {
							for {
								counter := wg.futex.Load()
								if counter == 0 { break }
								wg.futex.Wait(counter)
							}
						}

						func (wg *WaitGroup) Done() {
							wg.Add(-1)
						}
					

Channels


						// The runtime implementation of the Go 'chan' type.
						type channel struct {
							closed       bool
							selectLocked bool
							elementSize  uintptr
							bufCap       uintptr // 'cap'
							bufLen       uintptr // 'len'
							bufHead      uintptr
							bufTail      uintptr
							senders      chanQueue
							receivers    chanQueue
							lock         task.PMutex
							buf          unsafe.Pointer
						}
					

select


						var chan1 = make(chan int)
						var chan2 = make(chan int)

						func foo() {
							select {
								case <-chan1:
								case chan2 <- 1: // deadlock!
							}
						}

						func bar() {
							select {
								case chan2 <- 1:
								case <-chan1: // deadlock!
							}
						}
					

Garbage collector

Mark phase:

  1. Send POSIX signal to every other thread
  2. Scan current stack
  3. Wait for other threads to finish scanning their stack
  4. Scan globals
  5. Allow other threads to continue

RP2040

Before After

Baremetal futex?


						type Futex struct {
							atomic.Uint32

							waiters Stack // linked list of waiting goroutines
						}

						func (f *Futex) Wait(expected uint32) {
							spinlockTake()
							if f.Uint32.Load() == cmp {
								f.waiters.Push(Current())
								spinlockRelease()
								Pause()
							} else {
								spinlockRelease()
							}
						}

						func (f *Futex) Wake() {
							spinlockTake()
							if t := f.waiters.Pop(); t != nil {
								scheduleGoroutine(t)
							}
							spinlockRelease()
						}
						
					

Future work

  • Clean up and merge!
  • WebAssembly threads support
  • ESP32 support
  • Performance improvements

Questions?

Slides:
https://aykevl.nl/talks/2025-02-01-fosdem/

How to find me:
@ayke@hachyderm.io
@aykevl
@aykevl