Skip to main content

Gitlab with https in reversed proxy environment

In the recent work, there is a problem: How to provide a https Gitlab in reversed proxy environment ?

The environment is

It's hard for me since I am a noob for server settings XD. After a long discussion and test with Terry Chu, there are two possible solution to let that gitlab has https.

Solution 1

client --- (https, let's encrypt) ---- Caddy ---- (http) ------------------- gitlab

Solution 2

client --- (https, let's encrypt) ---- Caddy ---- (https, self-signed) ---- gitlab

If the gitlab https site is only connected through browser, solution 1 is enoguh. But it's not a proper solution when the gitlab is golang's repo server (e.g. go get -u If you have the need, you have to use the solution 2.

You can see the gist ( to get more details.

Memo - Create self-asigned key

  • Ref:
  • Command memo
openssl genrsa -out gitlab.key 2048
openssl req -new -key gitlab.key -out gitlab.csr
openssl x509 -req -days 3650 -in gitlab.csr -signkey gitlab.key -out gitlab.crt
openssl dhparam -out dhparam.pem 2048
mkdir -p ./data/gitlab/certs
cp gitlab.key gitlab.crt dhparam.pem ./data/gitlab/certs/

Solution 1 - backup

If you still has interesting in soluion 1, the following settings are notes.

  • Caddyfile {
    proxy / localhost:10080 {
        fail_timeout 0s
        header_upstream X-Forwarded-Ssl on
    tls {
        dns gandiv5
  • Dockerfile - Add NGINX_X_FORWARDED_PROTO=http in gitlab's environment variables.

MapReduce paper note

Reading paper/book


  • Mapreduce: programming framework with which you can write code to prcoess large datasets in a distributed system like HDFS.
  • HDFS: shared-nothing principle
    • Any coordination between nodes is done at the software level, using a conventional network
    • File blocks are replicated on multi-machines.
  • Has to implement two callback functions: mapper, reducer
  • Mapper: called once for every input record, generate any number of key-value pairs
  • Reducer: Take the key-value pairs procuded by the mappers, collect all the values belonging to the same key, and calls the reducer with an iterator over that collection of values.
  • A mapreduce program is called a job, A real problem maybe resolved by several jobs => workflow (Pig, Hive, Cascaing, Crunch and FlumeJava)
  • If the reducer has associativeity and interchangability, the program can set the combine function to reduce the overhead.
  • MapReduce has no conecpt of indexs. The index means the column in database. The raw data in MapReduce can not join because of the lack conect of it.
  • The reducers connect to each mappers and download the files of sorted key-value pairs for their partition => shuffle (a confusing term)


Fault Tolerence

  • Section 3-1 in original paper
  • Worker
    • Master check every worker periodically
    • If worker failure
      • If unfinished map/reduce task: reexecute it
      • If completed map work, executed it again since the immediate data on local disk.
      • If completed reduce work, no need to execute since the result is in global file system (ex: Google File System)
  • Master Failure
    • Save master's snapshot periodically
    • If master failure, pick another one node to be a new master.
    • It's not a easy problem.
  • Multiple worers run the same task to generate the same output
    • Rely on atomic commits (Yen3: Need more study)
    • Map task
      • If the master has the record for the map task, ignore it.
      • If not, record it with intermidiate filenames.
    • Reduce task - Rely on atomic rename for the reduce task.
      • The same reduce task generate the same output filename.
      • The guarantee is provided by filesystem.


  • Section 3-4 in origianl paper
  • Network bandwidth is a scarce resource
  • Google File System stores several copies in different machine (typically 3 copies)
  • When the schedule allocate a map task, it takes the location information
    • Attempts to schedule a map on the machine which has a replica
    • If failing that, attempt to schedule on the same network swtich as the machine containing the data.
    • Most input data is read locality and does not consum network bandwitdh.

Backup Tasks

  • Section 3-6 in origianl paper
  • Straggler: a machine that takes an unusually time to compelte one task.
    • Reason: resource starvation.
    • Resolve: Run backup task for the last unfinished tasks (in-progres tasks) to reuce the total time.
    • Result: In distribued sort, it takes longer 44% time to finish the job when the mechanism is disable.

Refinement - Partition Function

  • Section 4-1 in original paper
  • In general case, the default hash function can do data-partition fairly but it's not suitable for every situation. For example, if the key is url , the hash function could be redesigned for the attributed like hash(hostname(urlkey))

Refinement - Ordering Guarantees

  • Section 4-2 in original paper
  • If the key in a partition are sorted, Easy to generate the sorted output => support random access lookup and so on.

Refinement - Combiner Function

  • Section 4-3 in original paper
  • Combiner function: partial reduce in local machine. After the action, pass the result to reuce tasker to limit the network traffic, cpu loading and etc ...
  • Combiner function is typically the same as the reduce function.


Refinement - Input and Output Types

  • Section 4-4 in original paper
  • Support different input/output types.
  • Support read data from different source (e.g. database). Need to implement Reader class

Refinement - Skipping Bad Records

  • Section 4-6 in original paper
  • Some certain records cause map/reduce function crashs deterministically.
    • If the bug is in maper/reducer, fix it.
    • If the bus is in third-party librarry and the the resulce can accept/torlence some data lose (e.g. staistical analysis), ignore it.
    • Record the data record number before the data is executed by mapper/reducer. If the task crashs, send the record information to master for avoiding to process it again.

Refinement - Local Execution

  • Section 4-7 in original paper
  • It's hard to debug in distributed system. Provide a local-execution mode for debug (on a single machine.)

Refinement - Counters

  • Section 4-9 in original paper
  • Provide global counter to each job for staistical analysis.

Reduce-side/Map-side joins

  • Yen3: Need more study.
  • Reduce-side joins: the actual join logic in the reducers -> secondary sort
  • Map-side joins: If you can make certain assumptions about your input data, it is possible make joins faster by using map-side joins.
    • How to deal multiple data source ? use reduce-side joins and map-side joins.


  • Network bandwidth is a scarce source.
  • The one benefit of MapReduce is that the sequential behaviour is the same as the parallel behaviour. It's easy to debug.

Appendix: Google File System (GFS)

  • Yen3: Need to study in the near future.
  • A distributed file system. based on share-nothing principle. It does not require special hardware.
  • A central node (Name Node) keeps trace of which file blocks are stored on which machine. Each machine has a HDFS daemon that allows nodes to acces files store on the machine.
  • Conceptually, HDFS create a big file system to use all spaces on the disk of all machines running the daemon.
  • Failure tolerate: for machine and disk failures, file blocks are replicated on multiple machines.
    • Compare to RAID
      • Same: use redundant data to avoid data loss.
      • Different: Not required special hardware (RAID maybe). Just need a conventional datacenter network.
  • Good system papers -- details from apps all the way to network.

Python package note

Start a private pypi server

Start a pypi server with a docker container (yen3/pypiserver)

mkdir -p /tmp/pypi/packages
docker run -d -p 8080:8080 -v /tmp/pypi/packages:/data/packages yen3/pypiserver

Setting for the pypiserver

  • Add $HOME/.pypirc

    index-servers = private_pypi_server
    repository: http://localhost:8080
    username: admin
    password: admin
  • Add $HOME/.pip/pip.conf (Maybe you have to run mkdir $HOME/.pip first)

    extra-index-url = http://localhost:8080
    trusted-host = localhost
  • Add $HOME/.pydistutils.cfg

    index-url = http://localhost:8080

Install package from server

  • With setting - as normal

    pip install <package-name>
    easy_install <package-name>   # If the package has egg only
  • Without any setting - If you don't add $HOME/.pypirc and $HOME/.pip/pip.conf. It's useful when you want to install some packages in docker container and the container has no setting.

    pip install --extra-index-url http://localhost:8000 --trusted-host localhost <package-name>
    easy_install --index-url=http://localhost:8000 <pakcage-name>

Package and Upload python programs/libraries

Package your project

Assume you already have a python library/program and ready to package.

In the following context, the security concern is to package it without source code.

There are several ways to build python package. By security concern, the package commands can be divided into

  • python sdist: Source package. Including everyting
  • python bdist_wheel: Binary package. Including all *.py files and compiled c extensions (*.so, *.o ... etc)
  • python bdist_egg --exclude-source-files: Binary package. Including all *.pyc files and compiled c extensions

How do you choose for your project?

  • If you have no security concern, use python sdist (or python bdist_wheel)
  • If you have the security concern, use python bdist_egg --exclude-source-files. But the output package is depndent on python version.
    • If you choose the way, the package can only be installed through easy_install
    • The other possible way is to use python bdist_wheel. After all packages are installed, run python3 -m compileall -b . in source folder and remove all *.py files.

Python2 or Python3 pakcage ?

In the previous section, the python version will decide the output package version. If your python is python2/python3, it would generate python2/python3 package (except sdist package). The package program would package your project without any checks (e.g. syntax check). You have to check everything is ok.

If you run python3 to package a python2 package, it would not cause any fail. But your user would pain for the package.

Universal package

It's possible to build a package both supported python 2.7 and 3 but the assumption is that you have to take care everything about source code compatibility in py2.7 and py3. If you make sure the source code in the package can both run in these versions smoothly. You can package with the command

python bdist_wheel --universal

Or add these contents in setup.cfg


You can get more details from offical documents

Upload your package

  1. Assume there is a pypi server and it's named private_pypi_server in $HOME/pip/pip.conf
  2. The commands to upload your package are the same as package command but add upload -r private_pypi_server. e.g.
    • python sdist upload -r private_pypi_server
    • python bdist_wheel upload -r private_pypi_server
    • python bdist_egg --exclude-source-files upload -r private_pypi_server
Remove your uploaded package
  • Based on the current pypiserver's structure, all files are on /share/private_pypi_server/public/yen3/pypi. If you uploaded wrong packages or any other needs, you can find the file in the folder and remove it.


  1. I can not execute python bdist_wheel
    • Install wheel (pip install wheel) and run again.
    • If the package define the run command, you have to check to get more detials.
  2. What's difference between egg and wheel?
    • See the offical docs:
  3. I upload and overwrite the package with the same version but pip installs the previous package.
    • When you package your project, remember to remove these files/folders
      • *.pyc
      • build/
      • dist/
      • __pycache__/
    • Remove the package first pip uninstall <package-name> and
      • Remove $HOME/.cache/pip
      • Try pip install -U --no-cache-dir <package-name>
    • If the problem is still unsolved
      • Roll your package version
      • Remove your python and reinstall (If you use pyenv, it's easy to resintall a new python)


  • Offical document

Build a multi-arch docker image

The blog post has no new things, it's my personal note. If you have any suggestion for me or the post. Please feel free to tell me. I am glad to know.

I have to build a non-x86 docker image in my work. The problem lets me to study why the official docker image could support multi-arch pull? When you send a pull request to docker registry, it checks the arch of your machine and responses the corresponding images to you.

How to build a multi-arch docker image like this? If you build the image in mac OS, it's quite simple. Just build it since docker in macOS has hypervisor inside.

If in Linux? You have to provide a qemu binary to run non-x86 binary. The overview steps are

  1. Prepare static qemu binary
  2. Register binfmt_misc setting with static qemu binary
  3. build docker image with static qemu binary
  4. Pull image setting with manifest-tool

First, we have to know how to run non-x86 docker image in x86-64 linux platform. I would take aarch64 arch as the example in the blog post. You can replace aarch64 arch with any non-x86 arch.

Run aarch64 docker image in x86-64 linux

Run aarch64 binary in x86-64 linux

How to ruun aarch64 binary in x86 platform? Use qemu to simulate it. For example, there is a aarch64 binary named a.out and we want to run it with qemu. The command is qemu-aarch64 ./a.out.

In advanced, register binfmt with the aarch64 format to avoid to call qemu-aarch64 in every time. After register, you can run the ./a.out directly. When ./a.out is executed, the kernel checks the binary and knows the binary is for aarch64 then call qemu-aarch64 with it.

For example:

export QEMU_BIN_PATH=/usr/local/bin/qemu-aarch64

# Register binfmt for aarch64
echo ":qemu-aarch64:M::\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\xb7\x00:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:$QEMU_BIN_PATH:" > /proc/sys/fs/binfmt_misc/register

# Run the real command

# (Optional) Unreigster binfmt for aarch64
echo -1 > /proc/sys/fs/binfmt_mist/qemu-aarch64

Run aarch64 docker image

The main concept is like the previous section but more complicated. The major idea is to mount qemu binary to the image. When a binary in container is executed, the kernel (the container uses the host kernel) checks the binary format and runs the binary with qemu binary. In container, the rootfs is container's rootfs so qemu has to be mounted in container. The solution sounds good but still has some problem. If the qemu is builded with dynamic linking, it would look for share libraries in the run time. If the qemu binary runs in the container, it could not find them. The simple solution is to build a static-linked qemu binary to avoid the behaviour.

To simple the example, I make a helper docker image (yen3/binfmt-register) to run.

export cpu=aarch64
export RUN_IMAGE=arm64v8/alpine

# Register binfmt
docker run --rm --privileged yen3/binfmt-register set ${cpu}

# Get qemu static binary
docker run --rm yen3/binfmt-register get ${cpu} > qemu-${cpu}-static
chmod +x qemu-${cpu}-static

# Run the image with the qemu static binary
export MOUNT_QEMU="-v $(pwd)/qemu-${cpu}-static:/usr/local/bin/qemu-${cpu}-static"
docker run -it --rm ${MOUNT_QEMU} ${RUN_IMAGE} uname -a

# Unregister binfmt
docker run --rm --privileged yen3/binfmt-register clear ${cpu}

# Remove the qemu static binary
rm -f qemu-${cpu}-static

The complete example is in here.

How to get static-linked qemu binary ?

  • In ubuntu, install qemu-user-static package (apt-get install -y qemu-user-static) to get. The static binary could be found in /usr/qemu-*-static
  • In alpine, install qemu-${cpu} package toget. The static binary could be found in /usr/qemu-${cpu}
  • Build qemu from source code. The Dockerfile provides a simple example to build qemu.

Build aarch64 docker image in x86-64 platform

In the previous section, we know how to run the aarch64 bianry and docker image in x86-64 platform. How to apply the concept to build aarch64 image?

The native thinking is to mount the binary when building docker image. Unfortunelly, docker could not mount in build. To resolve the problem, use multi-stage build to achieve the same effect.

  1. Prepared the dockerfile and copy qemu-static-binary to the docker image. After the build steps are finished, remove the binary.

    FROM yen3/binfmt-register:0.1 as builder
    FROM arm64v8/alpine
    # Add qemu binary to run the inmage in x86-64 platform
    # The binary is unused in macOS docker
    COPY --from=builder /qemu/qemu-aarch64-static /usr/local/bin/qemu-aarch64-static
    # Do something what you want to do
    # ...
    # Remove the binary. It's unused in the final result
    RUN rm -f /usr/local/bin/qemu-aarch64-static
  2. Run the docker file with the helper docker image (with privileged access to set binfmt)

    # Register binfmt for aarch64
    docker run --privileged yen3/binfmt-register set aarch64
    # Build the docker image
    docker build -t yen3/test:arm64 .
    # Unregisrer binfmt for aarch64
    docker run --privileged yen3/binfmt-register clear aarch64

Pull a multi-arch docker image

So far there are several docker images for different cpu archs. It's unnecessary to combine these images into one image. The manifest-tool could help us to create a image to reference these images.

Assume there are three images are yen3/test:amd64 for x86-64, yen3/test:arm32v7 for arm and yen3/test:arm64 for aarch64 (remember to push them to dockerhub first), the manifest-tool can create a image to reference these images.

  1. Write a config file named manifest.yml for manifest-tool

    image: yen3/test:latest
      - image: yen3/test:amd64
          architecture: amd64
          os: linux
      - image: yen3/test:arm64
          architecture: arm64
          os: linux
      - image: yen3/test:arm32v7
          architecture: arm
          os: linux
  2. Run manifest-tool with the config file to create a image.

    ./manifest-tool push from-spec ./manifest.yml

That's all. Now we create a multi-arch docker image successfully.

In the future

docker manifeset command is still in discuss. The step may be changed in the near future.


I build a yen3/docker-ubuntu-novnc docker image (original from fcwu/docker-ubuntu-vnc-desktop) with these concepts. It supports x86-64, arm and aarch64. You can see the Makefile, dockerfiles, manifest.yml to get more details.


  • walkerlee's help

LXD basic usage note


LXD is an extended version of LXC (LinuX Container). LXD prvoides REST API and easy-to-use interface to control containers. LXD is written by go and uses LXC share library (liblxc) directly. LXD can provoide live migration with criu (LXC can not).

The major purpose of LXD/ LXC is to provide VM-like environment. You can control container just like you control VM before. Unlike Docker, you can use systemd in the lxd container. LXD also provides well-defined networking interface.

Install lxd

I test the lxd in ubuntu 16.04 and 16.10. The versions of ubuntu 16.04 and 16.10 are 2.0.7 and 2.4.0. The latest version (2017/03/06) is 2.10. After LXD 2.3.0, it has new commands lxd network. You can get more details from the article. If you have need to use lxd network command, you can build lxd by yourself. It's not difficult to build. Otherwise you can just use package from package manager.

You can install lxc and lxc both since the two command line interfaces are independent. The commands of LXD are lxc and lxd. The commands of LXD is started wtih lxc-. It's uneeded to use them in the same time.

Install from package

  • In ubuntu 16.04 and 16.10
    sudo apt install -y lxd
    # sudo apt install -y criu  # Support live migration.
    sudo lxd init

Install from source

  • Environment: Ubuntu 16.04.2
  • Ref: LXD - build from source

  • Assume these environment variables and add to .bashrc or .zshrc

    export GOPATH=$HOME/usr/go
    export PATH=$PATH:$GOPATH/bin
    #export LXD_DIR=/home/yen3/usr/lxd/server
    #export LXD_CONF=/home/yen3/usr/lxd/config
  • Install prerequesties

    sudo apt-get install software-properties-common
    sudo add-apt-repository ppa:ubuntu-lxc/lxd-git-master
    sudo apt-get update
    sudo apt-get install acl dnsmasq-base git golang liblxc1 \
        lxc-dev make pkg-config rsync squashfs-tools \
        tar xz-utils
  • Build from source

    mkdir -p $GOPATH
    go get
    cd $GOPATH/src/
  • Save the script to /etc/init.d/launch-my-lxd

    # Provides:          launch-my-lxd
    # Required-Start:    $all
    # Required-Stop:     $all
    # Default-Start:     2 3 4 5
    # Default-Stop:      0 1 6
    # Short-Description: Start my lxd
    # Description:       Start the latest lxd build by myself
    set -e
    export GOPATH=/home/yen3/usr/go
    #export LXD_DIR=/home/yen3/usr/lxd/server
    #export LXD_CONF=/home/yen3/usr/lxd/config
    case "$1" in
          $GOPATH/bin/lxd --group lxd &
          /etc/init.d/launch-my-lxd stop
          /etc/init.d/launch-my-lxd start
          killall lxd
      *) echo "Usage: $0 {start|stop|restart}" >&2; exit 1 ;;
    exit 0
  • Start the lxd daemon

    /etc/init.d/launch-my-lxd start
  • Init the lxd

    $GOPATH/bin/lxd init
  • Check the lxd dameon executing successfully.

    $ lxc list
    $ lxc --version
  • (Optional) Lauch the lxd daemon after reboot automatically.

    sudo systemctl enable launch-my-lxd
  • (Memo) Stop the lxd daemon

    sudo systemctl stop launch-my-lxd


  • Command memo
    sudo lxd init                          # Init LXD environment
    lxc profile show default               # Show default profile
                                           # The current content of default profile is to define a network adapter
    lxc launch ubuntu:16.04 yen3-ubuntu    # Init and start the container named `yen3-ubuntu`.
    # lxc launch ubuntu:16.04 yen3-ubuntu -p default -p yen3
                                           # As the bellow but with `default` and `yen3` two profiles.
    lxc init ubuntu:16.04 yen3-ubuntu      # Init the container but not start
    lxc list                               # List the status of all containers.
    lxc start yen3-ubuntu                  # Start the container
    lxc stop yen3-ubuntu                   # Shutdown the container
    lxc stop yen3-ubuntu --stateful        # Stop the container and save the state (need to `sudo apt install -y criu`).
                                           # The function maybe not work.
    lxc restart yen3-ubuntu                # Restart the container
    lxc delete yen3-ubuntu                 # Delete the container
    lxc file push source.file target.file  # Copy host's file to the container
    lxc exec yen3-ubuntu -- apt update     # Run command under the container directly
    lxc info yen3-ubuntu                   # Show the container info
    lxc config show yen3-ubuntu            # Show config
    lxc config edit yen3-ubuntu            # Edit config
    lxc config show default                # Show default config.
    lxc config device add yen3-ubuntu homedir disk source=/home/yen3 path=/home/ubuntu
                                           # Share host's folder to container
    # The configuation document:


Before use the lxd, I only have a little knowledge about network setting. If you are very familar with network setting, you can ignore the section. After LXD 2.3, LXD provoides lxd network command, you can read the article to get more details. I would not discuss how to use the command here. I just learn and write the basic usage.

The default networking setting of LXD container is bridge mode. Beside the mode, LXD provides physical, vlan and macvlan modes.

  • bridge (default): LXD create a NIC and connect the bridge (the bridge is created in sudo lxd init. It also run dnsmasq to prvoid DHCP service.)
  • physical: Use the host NIC directly.
  • vlan: I have no idea how to use vlan Xd.
  • macvlan: Simulate a different mac address based on a host NIC. The NIC conntects to host directly.

I take examples to use physical and macvlan.

  • physical mode

    • Assume the container's name is c1 and the enp0s8 NIC is unused in the host.
    • Add a NIC named eth1 to use enp0s8 and restart it.

      lxc launch ubuntu:16.04 c1
      lxc config device add c1 eth1 nic nictype=physical parent=enp0s8
      lxc restart c1
    • Exec lxc exec 1 -- ip addr to check the setting is successfully.

      3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    • (Optional) Assume the outer network setting is DHCP, I just set eth1 with DHCP and restart networking service.

      • Add the code segement to /etc/network/interface

        auto eth1
        iface eth1 inet dhcp
      • Restart the networking service and check the NIC setting.

        sudo systemctl restart networking
        ip addr
  • macvlan mode

    • I setup a VirtualBox VM to practice the mode.
    • The NIC has to allow all connection or it is not valid after setting.


    • Assume the container's name is c1 and the enp0s3 is in the host.
    • Add a macvlan NIC for the container based on enp0s3

      lxc launch ubuntu:16.04 c1
      lxc config device add c1 eth1 nic nictype=macvlan parent=enp0s3
      lxc start c1
    • (Optional) We can also set the NIC with DHCP as the below to check the status of NIC.

      16: eth1@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
      link/ether 00:16:3e:4f:38:d1 brd ff:ff:ff:  ff:ff:ff link-netnsid 0
      inet brd scope global eth1
         valid_lft forever preferred_lft forever
      inet6 fe80::216:3eff:fe4f:38d1/64 scope link
         valid_lft forever preferred_lft forever


local (unix socket)

  • API spec:
  • Get container list
    $ curl --unix-socket /var/lib/lxd/unix.socket \
        -H "Content-Type: application/json" \
        -X GET \
    # Pretty print with `python -m json.tool`
    $ curl --unix-socket /var/lib/lxd/unix.socket \
        -H "Content-Type: application/json" \
        -X GET \
        lxd/1.0/containers | python3 -m json.tool
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100   197  100   197    0     0   9244      0 --:--:-- --:--:-- --:--:--  9380
        "type": "sync",
        "status": "Success",
        "status_code": 200,
        "operation": "",
        "error_code": 0,
        "error": "",
        "metadata": [

TCP (https)

  • Ref: Directly interacting with the LXD API
  • Run these command first time

    $ sudo apt install -y jq
    $ lxc config set core.trust_password <some_password>
    $ curl -s -k --cert ~/.config/lxc/client.crt --key ~/.config/lxc/client.key -X POST -d '{"type": "client", "password": "some_password"}'
    $ curl -s -k --cert ~/.config/lxc/client.crt --key ~/.config/lxc/client.key | jq .metadata.auth
  • List all containers

    $ curl -s -k --cert ~/.config/lxc/client.crt --key ~/.config/lxc/client.key | jq .
      "type": "sync",
      "status": "Success",
      "status_code": 200,
      "operation": "",
      "error_code": 0,
      "error": "",
      "metadata": [


Run aarch64 program through qemu

這篇的目的是利用 qemu & gdb 觀看 aarch64 program 的行為。

最近會有一直看 & 執行 assembly code 的需求,但是也不是很想一直看 x86-64 的 assembly code XD。剛好最近在 aarch64 的東西,索性以 aarch64 為目標。

aarch64 的 compiler 並不難找,可以使用 Linaro GCC,以後如果沒有什麼意外應該 會自己 build 一個 FSF GCC,以後如果有機會看 GCC 內部行為的時候會比較方便。

aarch64 的 qemu 的話,其實我對 qemu 不熟,我在 Ubuntu 16.04 用 sudo apt search qemu 好像找不到 aarch64 的支援,我也不是很確定,但是自己 build 一個 qemu 來用並不困難,所以我選擇自己 build 一個 qemu。

Build qemu for arm and aarch64

mkdir -p /home/yen3/tools/qemu/src
cd /home/yen3/tools/qemu/src
tar xf qemu-2.8.0.tar.bz2
mkdir -p build_qemu
cd build_qemu
../qemu-2.8.0/configure --prefix=$HOME/tools/qemu  \
make install

Observe aarch64 through gdb and qemu

  1. 環境說明
    • 假設 aarch64 compiler 在 /home/yen3/tools/xc/gcc621/bin/aarch64-linux-gnu-gcc
    • 假設 qemu 在 /home/yen3/tools/qemu/bin/qemu-aarch64
  2. 編譯 main.c 變成 main.out (for aarch64)

    /home/yen3/tools/xc/gcc621/bin/aarch64-linux-gnu-gcc -g -O0 main.c -o main.out
  3. 利用 qemu 執行該程式 - 這邊需注意一點,需要透過 -L 設定 sysroot,如果是 抓 linaro gcc 的話,記得連 sysroot 的 tarball 也一並下載,如果是自己編的 話,應該會知道 sysroot 在那裡。

    /home/yen3/tools/qemu/bin/qemu-aarch64 \
      -L /home/yen3/tools/xc/gcc621 \
      -g 1234 \
  4. 利用 aarch64-linux-gnu-gdb 連到 gdb server 上

    /home/yen3/tools/xc/gcc621/bin/aarch64-linux-gnu-gdb -x test.gdb
    • test.gdb 內容如下
      target remote localhost:1234
      file ./main.out
      tui enable
      layout split
      b main
      set sysroot /home/yen3/tools/xc/gcc630/aarch64-linux-gnu

2017/01/26 Update

這幾天嘗試自己 build 了一個 debuggable aarch64-linux-gcc 出來,我把全部的 指令寫成一個 Makefile。

CC       = /home/yen3/tools/xc/gcc630/bin/aarch64-linux-gnu-gcc
GDB      = /home/yen3/tools/xc/gcc621/bin/aarch64-linux-gnu-gdb
QEMU_BIN = /home/yen3/tools/qemu/bin/qemu-aarch64
SYSROOT  = /home/yen3/tools/xc/gcc630/aarch64-linux-gnu

CFLAGS   = -ggdb -O0 -fno-builtin-printf
SRC      = main.c
BIN      = main.out

        $(CC) $(CFLAGS) ${SRC} -o ${BIN}
        QEMU_LD_PREFIX=$(SYSROOT) $(QEMU_BIN) -g 1234 ./${BIN} &
        $(GDB) -x test.gdb
        rm ${BIN}

也順便把 test.gdb 更新了一下,直接更新在上面。

不過忘了 build gdb,也無妨,現在是用 linaro gcc 的 gdb,過幾天有空再自己 build 一個。

A primer's note for parallel programming in Haskell

  • Functional Thursday #33
  • 2015.12.03
  • Yen3 (

About the slide

  • 這份投影片是 Parallel and Concurrent Programming in Haskell - Chatper 1 ~ 5 的筆記

  • 主題包含

    • Eval monad
    • Par monad
    • Repa
  • 今天的內容皆與 data-level parallelism 相關

  • 在 Haskell 中,一個較容易平行化 function 是 map,這份投影片會很常 討論到它。

Definition of Parallel

  • A parallel program is one that uses a multiplicity of computational hardware (e.g., several processor cores) to perform a computation more quickly.
  • Parallel programming in Haskell is deterministic: The parallel program always produces the same answer, regardless of how many processors are used to run it.

The status of value in ghc (1/)

  • There are three conditions of a value.
    • Unevaluated
    • Weak-Head Normal Form (WHNF) - evaluated with first constructor
    • Normal Form (NF) - fully evaluated

The status of value in ghc (2/)

  • sprint - prints a value without forcing its evaluation
  • seq: only far as the first constructor and doesn't evaluate any more of the structure. It evaluates first argument to WHNF.
seq :: a -> b -> b

The status of value in ghc (3/)

  • Example
Prelude> let x = 1 + 2 :: Int
Prelude> let y = x + 1
Prelude> :sprint x
x = _
Prelude> :sprint y
y = _
Prelude> seq y ()
Prelude> :sprint x
x = 3
Prelude> :sprint y
y = 4

The status of value in ghc (4/)

Prelude> let xs = map (+1) [1..10] :: [Int]
Prelude> :sprint xs
xs = _
Prelude> seq xs ()
Prelude> :sprint xs
xs = _ : _
Prelude> length xs
Prelude> :sprint xs
xs = [_,_,_,_,_,_,_,_,_,_]
Prelude> sum xs
Prelude> :sprint xs
xs = [2,3,4,5,6,7,8,9,10,11]

force function

  • force - fully evaluated it's argument and returns it. (WHNF into NF)
import Control.DeepSeq

class NFData a where
    rnf :: a -> ()      -- reduce to normal-form
    rnf a = a `seq` ()

deepseq :: NFData a => a -> b -> b
deepseq a b = rnf a `seq` b 

force :: NFData a => a -> a   
force x = x `deepseq` x
  • seq: only far as the first constructor and doesn't evaluate any more of the structure. It evaluates first argument to WHNF.

Eval monad

  • Decoupling of the algorithm from the parallelism
  • The type declaration for eval monad
data Eval a
instance Monad Eval

runEval :: Eval a -> a
rpar :: a -> Eval a   -- rpar :: Strategy a 
rseq :: a -> Eval a   -- rseq :: Strategy a
  • rpar - evaluate its argument in parallel.
  • rseq - Evaluate the argument and wait for the result.
    • evaluates its argument to WHNF.

Eval monad - simple example

  • Example
runEval $ do
    a <- rpar (f x)
    b <- rseq (f y)
    rseq a
    return (a, b)


Eval monad - Strategy

  • Strategy - modularize parllel code by separating the algorithm from the parallelism
    • use using function to add parallelism with the existing codes
    • withStrategy- a another version of using with the arguments flipped
type Strategy a = a -> Eval a

using :: a -> Strategy a -> a
x `using` s = runEval (s x)

withStrategy :: Strategy a -> a -> a
withStrategy s x = runEval (s x)

Eval monad - Strategy

  • The rpar, rseq are also Strategies.
rpar :: Strategy a
rseq :: Strategy a
  • You could write the algorithm first and add the parallelism code later ideally.

Eval monad - example for pair

  • Example
import Control.Parallel.Strategies
import Control.DeepSeq

evalPair :: Strategy a -> Strategy b -> Strategy (a,b)
evalPair sa sb (a,b) = do
    a' <- sa a
    b' <- sb b
    return (a',b')

Eval monad - example for pair

rparWith :: Strategy a -> Strategy a
rparWith s a =
        ra <- rpar a
        sa <- s ra
        return sa 

(+) 1 2             -- (1-1)  
((+) 1 2, (+) 3 4)  -- (1-2)

(+) 1 2 `using` rpar   -- (2-1)
<!--((+) 1 2, (+) 3 4) `using` evalPair (rparWith rseq) (rparWith rseq)  -- (2-2)-->
((+) 1 2, (+) 3 4) `using` evalPair (rparWith rseq) (rparWith rseq)  -- (2-2)
  • (1-1), (1-2) - sequential version
  • (2-1), (2-2) - parallel version and reduce the value to WHNF
wzxhzdk:10 - (3-1), (3-2) - parallel version and reduce the value to NF - `parTuple2` and `evalPair` functions are the same -->

Eval monad - some help functions (1/)

  • About some helper function
    • rdeepseq - evaluates the argument to NF
rdeepseq :: NFData a => Strategy a
rdeepseq x = rseq (force x)
- `rparWith` - wraps the Strategy s in an `rpar`
rparWith :: Strategy a -> Strategy a 

Eval monad - some help functions (2/)

  • The code reduced to NF in previous slide could also be written as
-- NF 
(+) 1 2 `using` rparWith rdeepseq 
((+) 1 2, (+) 3 4) `using`
    evalPair (rparWith rdeepseq) (rparWith rdeepseq)

Eval monad - parallelize map

parMap :: (a -> b) -> [a] -> [b]
parMap f xs = map f x `using` parList rseq 

evalList :: Strategy a -> Strategy [a]
evalList start [] = return []
evalList start (x:xs) = do
    x' <- start x
    xs' <- evalList start xs
    return (x': xs')

parList :: Strategy a -> Strategy [a]
parList start = evalList (rparWith start)
  • parMap will calculate its list to WHNF
  • parList - evaluate the list element in parallel

Eval monad

  • Example
import Control.Parallel.Strategies
import Control.DeepSeq

map (+1) [1..100]  -- (1) 
map (+1) [1..100] `using` parList rseq -- (2)
map (+1) [1..100] `using` parList rdeepseq  -- (3)
  • (1) sequential version
  • (2) parallelize version reduce value to WHNF
  • (3) parallelize version reduce value to NF

Example - Mandelbrot set

  • You could get more details from my blog post.

  • some type define

type Point = (Double, Double)
type Range = (Double, Double)
type Plane = (Range, Range)
  • sequential version
planePoints :: Plane -> V.Vector Point

mandelSet :: Plane -> V.Vector Point
mandelSet = planeToMandelPoints

Example - Mandelbrot set

  • basic parallel version with parList
splitPlane :: Integer -> Plane -> [Plane]

mandelSetStart :: Integer -> Plane -> V.Vector Point
mandelSetStart size p = V.concat
    (map planeToMandelPoints (splitPlane size p)
     `using` parList rseq)
  • In 2010 late MBP15 (Intel Core i5 2.4 Ghz, 8Gb)
    • sequential - about 45 secs
    • run in 2 cores - about 25 secs (./Mandelbrot par 100 +RTS -N2 -s)

Par Monad

  • Goal
    • be more explicit about granularity and data dependences
    • Avoid the reliance on lazy evaluation, but without sacrificing the determinism that we value for parallel programming.
    • The parallel computations are pure (and deterministic)
  • The library is implemented entirely as a Haskell library
    • You can accommodate alternative scheduling strategies.

Par Monad

  • Par monad - a monad for parallel computation
newtype Par a

instance Applicative Par
instance Monad Par

runPar :: Par a -> a    -- produce a pure result.
fork :: Par () -> Par () -- the way to create parallel tasks
  • IVar - results are communicated through IVars
data IVar a 

new :: Par (IVar a)
put :: NFData a => IVar a -> a -> Par ()
get :: IVar a -> Par a

Par Monad

  • IVar
data IVar a 

new :: Par (IVar a)
put :: NFData a => IVar a -> a -> Par ()
get :: IVar a -> Par a
  • IVar -- as a box that stars empty
  • put -- store a value in the box
    • All values communicated through IVars are fully evaluated. There is a head-strict version put_
  • get -- read the value. If the box is empty, it waits until the box is filled by put. The get operation does not remove the value from the box. Once the box is full. It stays the state constantly.

Par Monad

  • Example
runPar $ do
    i <- new
    j <- new
    fork (put i (fib n))
    fork (put j (fib m))
    a <- get i
    b <- get j
    return (a+b)


Par Monad

  • spawn - Like fork, but returns a IVar that can be used to query the result of the forked computation. Therefore spawn provides futures or promises.
  • parMap - parallel version map implemented with par monad
spawn :: NFData a => Par a -> Par (IVar a)
spawn p = do
    i <- new
    fork (do x <- p; put i x)
    return i

parMap :: NFData b => (a -> b) -> [a] -> Par [b]
parMap f as = do
    ibs <- mapM (spawn . return . f) as
    mapM get ibs

Example - prime number

  • Example
    • primeIntVector - Eval monad
    • primeIntVector' - Par monad
primeIntVector :: Int -> VU.Vector Int
primeIntVector n =
        ls = genNumberRange 0 n 100
        VU.concat (map (uncurry primes) ls `using` parList rseq)

primeIntVector' :: Int -> VU.Vector Int
primeIntVector' n =
        ls = genNumberRange 0 n 100
        VU.concat $ Par.runPar $ Par.parMap (uncurry primes) ls

Difference between Par and Eval

  • Par Monad
    1. Always evaluate its value to normal form. It avoids the problem about the weak-normal form
    2. The cost of calling runPar function is bigger then runEval
    3. Easy to redefine the scheduling strategy

Difference between Par and Eval

  • Eval Monad

    1. Need use force function to evaluate its value from weak-head normal form to normal form. It’s suitable for lazy data structure
    2. The cost of calling runEval function is free
    3. Provide the speculative parallelism
    4. Eval Monad has more diagnostics in ThreadScope compared Par Monad.
  • Reference


  • Repa - REgular PArallel arrays
  • Goal
    • efficient numerical array computations in Haskell and run them in parallel
  • It could provides efficient unboxed data computation, but not Par monad and Strategy monad
    • Repa also support boxed data.

Repa - type

  • The array type
data Array r sh e
  • r - representation type
  • e - element type
  • sh - the shape of array (the dimension(s) of array)
data Z = Z    -- scalar data
data tail :. head = tail :. head

type DIM0 = Z
type DIM1 = DIM0 :. Int
type DIM2 = DIM1 :. Int

Repa - array

  • how to create an array with Array type ?
    • fromListUnboxed - from list of unboxed type
    • fromUnboxed - from the vector with Data.Vector.Unboxed type
    • fromFunction - from the shape information to generate the array
    • ... etc
fromListUnboxed :: (Shape sh, Unbox a) => sh -> [a] -> Array U sh a
fromFunction :: sh -> (sh -> a) -> Array D sh a
fromUnboxed :: (Shape sh, Data.Vector.Unboxed e) :: sh -> e -> Array U sh e

Repa - create array example

  • Example - create an array
Prelude> import Data.Array.Repa as R
Prelude R> let a = R.fromListUnboxed (Z :. 10) [1..10] :: Array U DIM1 Int
Prelude R> :t a 
a :: Array U DIM1 Int

Prelude R> let b =  R.fromFunction (Z :. 10) (\(Z :. i) -> i + 1 :: Int)
Prelude R> :t b
b :: Array D (Z :. Int) Int

Prelude R > import qualified Data.Vector.Unboxed as VU
Prelude R VU> let v = VU.enumFromN 1 10 :: VU.Vector Int
Prelude R VU> let c = R.fromUnboxed (Z :. (VU.length v)) v
Prelude R VU> :t c
c :: Array U (Z :. Int) Int

Repa - array computation

  • All array will transfer to a delayed array type (ex: Array D sh e) after array computations (ex: :: (Shape sh, Source r a) =>
     (a -> b) -> Array r sh a -> Array D sh b

(+^) :: (Num c, Shape sh, Source r1 c, Source r2 c) =>
     Array r1 sh c -> Array r2 sh c -> Array D sh c

Repa - compute

  • computeS - calculate the array operations in sequentially.
  • computeP - the same as computeS but in parallel.
    • the purpose of the monad is only to ensure that computeP operations are performed in sequence and not nested.
      • the simplest way to reduce the monad effect -- runIdentity
      • See page p.94 to get more information
computeS :: (Load r1 sh e, Target r2 e) => Array r1 sh e -> Array r2 sh e
computeP :: (Monad m, Source r2 e, Target r2 e, Load r1 sh e) =>
    Array r1 sh e -> m (Array r2 sh e)

Repa - array computation example

  • calculate $e^x = \sum^{\infty}_{n=0}\frac{x^n}{n!} \forall x$
import Data.Array.Repa as R
import Control.Monad.Identity

fact x = foldr (*) 1 [1..x]

enumN :: Int -> Array D DIM1 Double
enumN n = R.fromFunction (Z :. n) (\(Z :. i) -> fromIntegral i)  

exp' :: Int -> Double
exp' x = let
             ns = enumN 100
             ys = (\n -> (((fromIntegral x)**n) / (fact n)))
             runIdentity . R.sumAllP $ ys
wzxhzdk:32 -->

Repa - example

  • Example - prime numbers
primeArray :: Int -> VU.Vector Int
primeArray n = let
                   a = genArray n
                   ps = runIdentity . Repa.computeUnboxedP . primeArrayCheck $
                        a :: Array U DIM1 Int
                   VU.filter (/=0) (Repa.toUnboxed ps)


  • The simplest parallel method - parallel map

    • use parMap or parList
  • Repa is useful especially for numeric calculation.

  • The remaining part of the book is about.

  • Bool unbxoed type ?