Notes on The Raft Consensus Algorithm

What’s consensus?

It allows a collection of machines to work as a coherent group that can survive the failures of some of its members.

It means not only a group of machines reach a final decision for a request, but also the state machine is replicated across these machines, so that some failures do not affect the functioning. Raft is a consensus algorithm seeking to be correct, implementable, and understandable.

The thesis is very well written. It is much more comprehensive compared to the NSDI paper. Implementing Raft based on the thesis shouldn’t be too difficult (of course, also not trivial). The author also built a website putting all kinds of helping things there. I read the paper and decide to take some notes here.

There are two key parts sitting in the core of the algorithm:

Leader election

The election is triggered by a timeout. If a server failed to detect heartbeats from the current leader, it start a new term of election. During the term, it broadcast requests to collect votes from other servers. If equal or more than majority of servers reply with a vote, the server becomes the leader of this term. The “term” here is a monotonically increasing logic time. From the perspective of a server receiving the vote request, it decides whether to give the vote based on a few considerations. First of all, if the sender even falls behind the receiver in terms of log index, the receiver should not vote for it. Also, if the receiver can still hear the heartbeats from current leader, it should not vote too. In this case, the requester might be a disruptive server. In other cases, the receiver should vote for the sender.

Log replication

Once a server becomes the leader, it’s mission is simply replicate it’s log to every other follower. The replication means make the log of a follower exactly the same as the leader. For each pair of leader and follower, the leader first identify the highest index where they reach an agreement. Starting from there, the leader overwrite its log to the follower. The leader handles all requests from clients. Once it receives a new request, it first put the request into its own log. Then, it replicate the request to all followers. If equal or more than majority followers (including the leader itself) answer the replication request with success, the leader apply the request into its state machine (this is called commit). The leader put the new log index into its heartbeats, so followers know if the request has been committed, after which each follower commit the request too.

More formal introduction of the core Raft could be found in Fig. 3.1 in the thesis paper. There are also a few extensions to make the algorithm practical to be used in production systems, such as the group management. I also found Fig. 10.1 a very good reference of architecture.

There are quite a lot of implementations of Raft, which could be found here. I also find a project named Copycat, with code here and document here. Copycat is a full featured implementation of Raft in java. Building your own application based on Copycat shouldn’t be too difficult. They provide an example of implementing a KV store based on Copycat in their source code here, which is used as the “Get Started“ tutorial. Another very important reason, why I think Copycat a good reference, is that it emphases the abstraction of state machine, client, server, and operations. Therefore, going through it’s document enhanced my understanding of Raft.

If you don’t want to build your own Raft, may be Copycat is worthwhile a try, though I haven’t any real experience beyond a toy project.

The annotated thesis could be found here.

A go-through case for understanding

A typical request handling process is as follows:

  1. The client sends a request to the cluster;
  2. The leader handles the request by putting it to a WAL;
  3. The leader sends the request to all followers;
  4. Each follower puts the received request to its WAL, and responds to the leader;
  5. Once the leader has heard a majority number of responses from its followers, the leader commit the request by applying the WAL to its state machine;
  6. The leader inform the client that the request has been handled properly, and then, put the index of the request into its heartbeat to let all followers know the status of each request;
  7. Once the follower knows that the request has been committed by the leader, the follower commit the request too by applying it to its own state machine.

There are a few key points to understand in the process above:

1.Does the client always know if its request has been handled properly?

No. If the leader commits the request and then crashes, the client will not know if the request has been actually successfully handled. In some cases, the client will resend the request which may lead to duplicated data. It leaves for the client to avoid such kind of duplication.

2.How about the leader crashes before inform its followers that the request has been committed?

If the leader crashes, a follower will be elected to be the next leader. The follower must have the latest state according to the mechanism of Raft. Therefore, the next leader definitely has the WAL for the request, and the request has definitely been replicated across a majority number of hosts. Therefore, it is safe to replicate its state to all followers.

3.Key feature of a consensus algorithm (or strong consistency)?

Under normal situations, if there’s a state change, the key step changing the state should be always handled by a certain node. The state changing should be replicated to a majority number of followers before informing the requester a success. Each read request goes to that certain node as well. Once there’s node failures or networking partitions, the service stop working until returning to the normal situation again.

An Algorithm Deduplicating File Locks

In one of my recent project, I need to implement a lock service for files. It is quite similar to the lock service of existing file systems. I need to support two modes: write (exclusive) and read (shard). For instance, “R/a/b/c” means a read lock for the file with path “/a/b/c”. Likewise, “W/a/b/c” stands for a write lock.

The interface is pretty straightforward. The user provide a set of locks, e.g., {R/a/b/c, R/a/b/d}, to be locked by our service. One challenge we face is to deduplicate the locks given by the user. For instance, the user may provide a set of locks {R/a/b/c, W/a/b/c}. In this example, we know the write lock is stronger than the read lock, and therefore, the set is equivalent to {W/a/b/c}. We tend to reduce the number of locks, since it is beneficial for checking less conflicts in the steps afterwards.

So far, the deduplication problem sounds quite simple, since we only need to consider locks with exactly the same path. However, we also need to support a wildcard notation, because we have a large number of files under one directory. If we want to lock all files in this directory, we need to generate a huge list of locks, each for a file. With a wildcard, we lock all files under “/a/b/“ with “R/a/b/*”. The wildcard makes the deduplication problem much more complicated.

We first clarify the coverage of locks by a concrete example:

1
W/a/b/* > W/a/b/c > R/a/b/c

Given a large set of locks, a very simple algorithm is that we compare each pair of locks. If one is stronger than the other, the weaker one is throw away. The result set contains only locks where no one is stronger than other locks. This algorithm’s complexity is O(n*n) which is slow if the given set is large. So, I try to find a faster algorithm.

After some thoughts, I developed an algorithm with complexity O(n), which constructs a tree on the set of locks. Each tree node has the following attributes:

1
2
3
4
5
6
7
8
class Node {
String name; //the name of the node
String mode; //"W" or "R"
Node parent; //the parent of the node
boolean mark = false; //mark=true if there's at least one W node in its subtree
Map<String, Node> children = new HashMap<>(); //the children of this node
...
}

We now explain how to construct a lock tree with a concrete example.

In the figure above, we show inserting a set of locks into an empty lock tree. Each node is represented with a string “mode:name:mark”.

Step 1~3: Inserting the first three locks {R/a/b/c, R/a/b/d, R/a/e/f} is straightforward.
Step 4: Then, inserting the 4th lock, i.e., “W/a/b/c”, has two effects: (1) upgrade the original “R” lock of /a/b/c to “W” mode; (2) mark the path of “/a/b/c” from 0 to 1.
Step 5: The 5th lock “R/a/b/e” is the same with the first three.
Step 6~7: Then, the 6th lock “R/a/b/*“ contains a wildcard. Having a “R” wildcard means replacing all unmarked nodes (“R:e:0” and “R:d:0”) with a “R:*:0” node, in its parent’s subtree (the parent is “R:b:1” in this case). Similarly, inserting “R/a/*/*“ first removes all unmarked nodes in “R:a:1”‘s subtree, i.e., “R:e:0”, “R:*:0”, and “R:f:0”, and then insert new nodes with wildcard.
Step 8: Finally, inserting a “W” mode lock with wildcard means deleting all nodes in the subtree of “R:a:1” (the wildcard node’s parent) and then insert the new nodes.

After constructing the lock tree, each path from root to leaf is a lock in our deduplicated result set. In practice, a hashmap may already solve most duplicated cases. We rarely encounter extreme cases as shown in the example above. The algorithm briefly described above is only for fun :). I didn’t mention paths with different lengths since they could be put into different trees, which is therefore not a real problem.

I didn’t elaborate all cases in the example above, which is more complicated. A sample implementation of the algorithm could be find here. The output is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
R/a/b/c
ROOT: abc[R]
R/a/b/d
ROOT: abc[R]
ROOT: abd[R]
R/a/e/f
ROOT: abc[R]
ROOT: abd[R]
ROOT: aef[R]
W/a/b/c
ROOT: abc[W]
ROOT: abd[R]
ROOT: aef[R]
R/a/b/e
ROOT: abc[W]
ROOT: abd[R]
ROOT: abe[R]
ROOT: aef[R]
R/a/b/*
ROOT: abc[W]
ROOT: ab*[R]
ROOT: aef[R]
R/a/*/*
ROOT: abc[W]
ROOT: a**[R]
W/a/*/*
ROOT: a**[W]

Notes on Designing an Admission Control System

An admission control system takes care of two parts: action and resource. As a concrete example, let’s assume you are a sales manager. A typical business is viewing the orders made by all sales man in your group. Here, viewing the orders is the “action” part, while the orders made by your group is the “resource” part.

Given a user account, an admission control system tells which actions the user could take on which resources. For a website, the action is usually represented with a URI. For instance,

1
2
3
"/orders/view" -> view the orders
"/user/delete" -> delete a user
"/user/update" -> update a user's profile

A typical way of managing the actions is through a layered representation of users, roles, and permissions. A user belongs to a role, and a role is composed of a set of permissions. Each permission is linked to a URI for instance. For convenience reasons, people usually add additional layers to the three-layer structure. For example, a “permission group” layer makes it easier for the manager to assign permissions when the number of permissions is too large.

1
user -> role -> permission

Representing the resource is nontrivial. Before that, you need domain knowledge to understand the structure of the resource. In our example of sales manager, the resource are orders which could be managed in a tree. The manager can see all orders within her subtree, while a sales man cannot see orders from her siblings. In this case, the resource is put into a tree as follows.

1
2
3
4
5
6
7
manager -- sales man 1
|
-- sales man 2
|
-- sales man 3 -- agent 1
|
-- agent 2

You may maintain multiple trees, each for a logic organization. For instance, one tree for sales and another for operators. Each order is linked to a node on the tree. When a action arrives, e.g., “/orders/view/1” where “1” is the order id, we check if the linked node of this order entry in DB. If the node is within the user’s subtree, the admission is granted, otherwise denied.

Last but not least, be sure to use a white list for the super user. In other words, the super user should not go through the admission control system. Otherwise, when the admission control system is down, you can do nothing at all without permission.

乌合之众

准确的说《乌合之众》是指“The Crowd——A Study of the Popular Mind”,作者是Gustave Le Bon,一位1931年离世的法国作家。我所读的版本是由广西师范大学出版社出版的,由冯克利翻译的版本。有意思的是这本书正文只有32开的183页,而其中1-29页是Robert K. Merton撰写的读后感,所以在阅读正文之前已经有非常详尽的对于书的评论了。

读后感里对《乌合之众》的评论是非常纠结的,既赞扬了作者深刻的观察力,也批评了作者受其根深蒂固的观念的影响。相信赞扬的成分还是要比批评的成分大得多的,其实一部书里哪怕揭示了一点潜藏的知识,那这部书就已经很了不起了,哪怕附带许多明显的错误。《乌合之众》就是这样,在短短的一百多页里,作者其实只想传达一个观点——“群体的思维和个人是不同的”。更准确的说,是群体的思维是混乱的、富含想象力的、可传染的、极端的、低俗简单的、扩大化的、矛盾的,跟文明社会里的个人相比,群体就好比是来自野蛮社会的个体。

《乌合之众》里有非常多关于上面我列举的各个形容词的描述,例如:“打动群体心灵的是神话中的英雄,而不是一时的真实英雄”、“群体感情的一致倾向会立刻变成一个既定事实”、“群体总是落后于博学之士和哲学家好几代人”。尤其经典的是“群体推理的特点,是把彼此不同、只在表面上相似的事物搅在一起,并且立刻把具体的事物普遍化”,比如“受雇主剥削的苦力,立刻便认为天下所有的雇主都是剥削他们的人”。

这本书两个世纪前就已经完成了,但它所描绘的现象却不断重现着。近代近两百年来中国发生的各项运动,以及参与这些运动的农民、学生、工人、商人等等,无不是《乌合之众》里描绘的角色。他们富有激情,时刻以民族、道德为口号党同伐异,采用最野蛮和暴力的方式。他们看似有无穷的力量,推动社会和历史前进,而实际上他们不过是任人摆布的玩偶,为达到某些目的而布施的棋子。

可恨的是直到今天这样的场景仍然此起彼伏,人们并没有比一两百年前的先辈更加冷静和明白,而一些野心家还试图利用群体的这种特点,一旦风吹草动人们仍然会像暴民一样无差别摧毁一切(很多时候包括这些活动的制造者)。作为个人,在群体面前显得太势单力薄了,我们无法改变什么,甚至无法保护自己,我们能够做的,仅仅是躲避群体带来的灾难——这种灾难是自然灾难无法比拟的!

Running Mapreduce over Openstack Swift

I recently got a chance of experimenting mapreduce over several opensource object storages. Openstack swift is definitely one well known object storage service. However, I found it non-trivial to set it up for evaluation. Therefore, I put some notes below.

This article is only for os=Ubuntu 14.04 and swift=liberty. I noticed that the newer versions of swift is with much better documentation, which is much easier to follow. The appendix contains my early trials of installing swift from source.

Read the followings

Make sure you have some sense for concepts including user, role, tenant, project, endpoint, proxy server, account server, object server, rings, and etc.

To my understanding, they are:

  • user: a real user, or a service
  • role: role of users, corresponding to a set of permissions
  • tenant = project: a set of users
  • endpoint: the entry point urls of openstack services
  • proxy server: handling user requests in swift
  • account server: handling user account in swift, also used as domain or primary namespace
  • object server: handling object storage in swift
  • rings: the consistent hashing algorithm used by swift
  • keystone: the authentication service in openstack. Key concepts on keystone can be found here

Setup keynote and swift

Install dependencies

  • Install MariaDB (or MySQL) and memcache following page

  • Install keystone following page1 and page2. Note that if you want to run mapreduce over swift, you can not use the TempAuth approach. Read this page for more details.

  • Install swift following page1, page2, and page3. You can start the swift service with

1
swift-init all start

Setup Hadoop

  • Setup Hadoop with version >= 2.3 and configure it following page1 and page2.
  • Make sure SwiftNativeFileSystem is in the classpath, read this page for any problem you find.
  • Configure etc/hadoop/core-site.xml,add following contents:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
<configuration>
<property>
<name>fs.swift.impl</name>
<value>org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem</value>
</property>
<property>
<name>fs.swift.service.SwiftTest.auth.url</name> <---SwiftTest is the name of the service
<value>http://127.0.0.1:5000/v2.0/tokens</value> <---the ip:port should point to keystone. Make sure having "/tokens" there.
</property>
<property>
<name>fs.swift.service.SwiftTest.auth.endpoint.prefix</name>
<value>/AUTH_</value>
</property>
<property>
<name>fs.swift.service.SwiftTest.http.port</name>
<value>8080</value> <--- the same with that in proxy-server.conf
</property>
<property>
<name>fs.swift.service.SwiftTest.region</name>
<value>RegionOne</value>
</property>
<property>
<name>fs.swift.service.SwiftTest.public</name>
<value>true</value>
</property>
<property>
<name>fs.swift.service.SwiftTest.tenant</name>
<value>admin</value> <--- name of the project, not the role!
</property>
<property>
<name>fs.swift.service.SwiftTest.username</name>
<value>admin</value> <--- user name
</property>
<property>
<name>fs.swift.service.SwiftTest.password</name>
<value>adminpassword</value> <--- password
</property>
</configuration>

Verify all setup

  • create a container (swift post nameofcontainer). Important! the name should be only consist of letters. No ‘_’ is allowed.
  • upload a file (swift upload nameofcontainer nameoffile).
  • hdfs dfs -ls swift://nameofcontainer.SwiftTest/. This should show you files you uploaded previously.
  • hadoop distcp nameoffile swift://mycontainer.SwiftTest/nameoffile

Appendix: Install swift from source

Dependencies

From the swift book

1
2
apt-get install git curl gcc memcached rsync sqlite3 xfsprogs \
git-core libffi-dev python-setuptools
1
2
3
4
apt-get install python-coverage python-dev python-nose \
python-simplejson python-xattr python-eventlet \
python-greenlet python-pastedeploy python-netifaces \
python-pip python-dnspython python-mock

Install the latest liberasurecode (as per some post, may not be necessary)

1
2
3
4
5
6
7
8
9
sudo apt-get install build-essential autoconf automake libtool
cd /opt/
git clone https://github.com/openstack/liberasurecode.git
cd liberasurecode
./autogen.sh
./configure
make
make test
sudo make install

Install the Swift CLI

1
2
3
4
cd /opt
git clone https://github.com/openstack/python-swiftclient.git
cd /opt/python-swiftclient; sudo pip install -r requirements.txt;
python setup.py install; cd ..

I run into a problem of invalid format of the requirement.txt. You may need to do some editing in that case.

Install the Swift

1
2
3
cd /opt
git clone https://github.com/openstack/swift.git
cd /opt/swift ; sudo python setup.py install; cd ..

The above is from the swift book. But I realized I still need to run “sudo pip install -r requirements.txt” before the setup*

Another issue I found is that it takes forever to pip install cryptography in the requirements.txt. I searched in Google where some guy said it took 20 min for building. For me, after 20 min, it still hanged there.

I tried to install all dependencies manually, as follows without the cryptography. After that, it was installed successfully:

1
2
3
4
5
6
7
8
9
Requirement already satisfied (use --upgrade to upgrade): cryptography in /usr/local/lib/python2.7/dist-packages
Requirement already satisfied (use --upgrade to upgrade): idna>=2.0 in /usr/local/lib/python2.7/dist-packages (from cryptography)
Requirement already satisfied (use --upgrade to upgrade): pyasn1>=0.1.8 in /usr/local/lib/python2.7/dist-packages (from cryptography)
Requirement already satisfied (use --upgrade to upgrade): six>=1.4.1 in /usr/local/lib/python2.7/dist-packages (from cryptography)
Requirement already satisfied (use --upgrade to upgrade): setuptools>=11.3 in /usr/local/lib/python2.7/dist-packages (from cryptography)
Requirement already satisfied (use --upgrade to upgrade): enum34 in /usr/local/lib/python2.7/dist-packages (from cryptography)
Requirement already satisfied (use --upgrade to upgrade): ipaddress in /usr/local/lib/python2.7/dist-packages (from cryptography)
Requirement already satisfied (use --upgrade to upgrade): cffi>=1.4.1 in /usr/local/lib/python2.7/dist-packages (from cryptography)
Requirement already satisfied (use --upgrade to upgrade): pycparser in /usr/local/lib/python2.7/dist-packages (from cffi>=1.4.1->cryptography)

Copy in Swift configuration files

1
2
3
4
5
6
7
8
9
10
mkdir -p /etc/swift
cd /opt/swift/etc
cp account-server.conf-sample /etc/swift/account-server.conf
cp container-server.conf-sample /etc/swift/container-server.conf
cp object-server.conf-sample /etc/swift/object-server.conf
cp proxy-server.conf-sample /etc/swift/proxy-server.conf
cp drive-audit.conf-sample /etc/swift/drive-audit.conf
cp swift.conf-sample /etc/swift/swift.conf

chown -R swift:swift /etc/swift

In this setup, we rely on tempauth for authentication.

Config the swift.conf

Setting the hash path prefix and suffix

These two random strings are used to determine the hash result of the partitions on the rings. They are supposed to be confidential.

1
2
swift_hash_path_suffix = random_32_length_string1
swift_hash_path_prefix = random_32_length_string2

You can use

1
head -c 32 /dev/random | base64

to get one random 32 length string.

The storage policy

You can custermize the storage policy as per the swift book, but it is optional.

Build the rings

The 3 parameters are part_power, replicas, min_part_hours

  • part_power: the number of partitions in the storage cluster. The typical setting is log_2 ( 100 maximun number of disks ). For this setup, it is log_2 (100 1) which is close to 7.
  • replicas: in this setup, there is only one drive. Therefore, only 1 replica is allowed.
  • min_part_hours: the default is 24 hours. Tune it as per the swift book or the official document.
1
2
3
4
cd /etc/swift
swift-ring-builder account.builder create 7 1 1
swift-ring-builder container.builder create 7 1 1
swift-ring-builder object.builder create 7 1 1

Prepare the drives

You can use a disk or chop off disk space from existing disk to provide storage for swift. Follow the step 2 of this article. Copied here:

Attach a disk which would be used for storage or chop off some disk space from the existing disk.
Using additional disks:
Most likely this is done when there is large amount of data to be stored. XFS is the recommended filesystem and is known to work well with Swift. If the additional disk is attached as /dev/sdb then following will do the trick:

1
2
3
4
5
# fdisk /dev/sdb
# mkfs.xfs /dev/sdb1
# echo "/dev/sdb1 /srv/node/partition1 xfs noatime,nodiratime,nobarrier,logbufs=8 0 0" >> /etc/fstab
# mkdir -p /srv/node/partition1
# mount /srv/node/partition1

Chopping off disk space from the existing disk:
We can chop off disk from existing disks as well. This is usually done for smaller installations or for “proof-of-concept” stage. We can use XFS like before or we can use ext4 as well.

1
2
3
4
5
# truncate --size=2G /tmp/swiftstorage
# DEVICE=$(losetup --show -f /tmp/swiftstorage)
# mkfs.ext4 $DEVICE
# mkdir -p /srv/node/partition1
# mount $DEVICE /srv/node/partition1 -t ext4 -o noatime,nodiratime,nobarrier,user_xattr

In either case, you need to

1
chown -R swift:swift /srv/node/partition1

Also, you need to mount automatically after system reboot. So put the mount command line into a script, e.g., /opt/swift/bin/mount_devices. Then add a file start_swift.conf under /etc/init/ with content

1
2
3
4
description "mount swift drives"
start on runlevel [234]
stop on runlevel [0156]
exec /opt/swift/bin/mount_devices

Make sure to make the script runnable

1
chmod +x /opt/swift/bin/mount_devices

Add the drives to the rings

The single parameter 100 is the weight for load balancing. Note that the partition1 in the command is in the path of /srv/node/partition1. If you change the path, you need to change both places. It is not the name of the device in /dev/sda, for example. Make sure about the IPs and the PORTs. They are the address of the processes (account, container, and object). I was blocked by the port for quite a while, simply because the default ports in the conf files are different from the ones used below (not sure why…).

1
2
3
swift-ring-builder account.builder add r1z1-127.0.0.1:6002/partition1 100
swift-ring-builder container.builder add r1z1-127.0.0.1:6001/partition1 100
swift-ring-builder object.builder add r1z1-127.0.0.1:6000/partition1 100

Rebalance the rings to create them actually.

1
2
3
swift-ring-builder account.builder rebalance
swift-ring-builder container.builder rebalance
swift-ring-builder object.builder rebalance

You will find the *.ring.gz files and a backup folder.

Configure the logs

put a file named 0-swift.conf under /etc/rsyslog.d directory, containing only one line:

1
local0.* /var/log/swift/all.log

And then create the directory and set the right permissions.

1
2
3
4
5
mkdir /var/log/swift
chown -R syslog.adm /var/log/swift
chmod -R g+w /var/log/swift

service rsyslog restart

Start the proxy server

1
swift-init proxy start

If you see warning messages as discussed in this post. You can follow the solutions there, copied here:

1
2
3
4
5
curl -o libisal2_2.15.0-3~bpo14.04+1_amd64.deb http://mitaka-trusty.pkgs.mirantis.com/debian/pool/trusty-mitaka-backports/main/l/libisal/libisal2_2.15.0-3~bpo14.04+1_amd64.deb 
sudo dpkg -i libisal2_2.15.0-3~bpo14.04+1_amd64.deb
git clone https://bitbucket.org/kmgreen2/pyeclib.git
sudo python setup.py install
sudo apt-get uninstall python-pyeclib

Configure tempauth in proxy-server.conf

We rely on tempauth for authentication in this setup. If you are using keystone, follow this post or this one.

First, start the memcache:

1
service memcached start

Add your users to proxy-server.conf under the tempauth block.

1
2
3
4
5
6
7
8
9
10
[filter:tempauth]
use = egg:swift#tempauth
# You can override the default log routing for this filter here:
...
# <account> is from the user_<account>_<user> name.
# Here are example entries, required for running the tests:
user_admin_admin = admin .admin .reseller_admin
user_test_tester = testing .admin
user_test2_tester2 = testing2 .admin
user_test_tester3 = testing3

The syntax is

1
user_$SWIFTACCOUNT_$SWIFTUSER = $KEY [group] [group] [...] [storage_url]

which means you can configure the user for each storage url.

Then, modify these two options:

1
2
allow_account_management = true
account_autocreate = true

Start the servers

Create the cache directory

1
2
mkdir -p /var/swift/recon
chown -R swift:swift /var/swift/recon

Start the services after making sure the conf files are right, especially the ip and port. The ip is typically set to 0.0.0.0 and the port should be the same as adding the drives.

1
2
3
swift-init account start
swift-init container start
swift-init object start

Then, restart the proxy server

1
swift-init proxy restart

Verify the setup

Send in the authentication (note that I’m using the admin user defined in the proxy-server.conf with group admin and password admin):

1
curl -v -H 'X-Auth-User: admin:admin' -H 'X-Auth-Key: admin' http://localhost:8080/auth/v1.0/

The above will get you the X-Auth-Token if everything is right. For instance,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
* Hostname was NOT found in DNS cache
* Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /auth/v1.0/ HTTP/1.1
> User-Agent: curl/7.35.0
> Host: localhost:8080
> Accept: */*
> X-Auth-User: admin:admin
> X-Auth-Key: admin
>
< HTTP/1.1 200 OK
< X-Storage-Url: http://localhost:8080/v1/AUTH_admin
< X-Auth-Token-Expires: 18098
< X-Auth-Token: AUTH_tka1a0d192e57746839c1749f238ba5419
< Content-Type: text/html; charset=UTF-8
< X-Storage-Token: AUTH_tka1a0d192e57746839c1749f238ba5419
< Content-Length: 0
< X-Trans-Id: txb9682bc6bab448659fd50-005881a50f
< X-Openstack-Request-Id: txb9682bc6bab448659fd50-005881a50f
< Date: Fri, 20 Jan 2017 05:50:07 GMT
<
* Connection #0 to host localhost left intact

Next, use the token to access the account by

1
curl -v -H 'X-Storage-Token: AUTH_tka1a0d192e57746839c1749f238ba5419' http://127.0.0.1:8080/v1/AUTH_admin/

the admin in the url AUTH_admin should be the same as the username. You should get something like follows.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
* Hostname was NOT found in DNS cache
* Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
> GET /v1/AUTH_admin HTTP/1.1
> User-Agent: curl/7.35.0
> Host: 127.0.0.1:8080
> Accept: */*
> X-Storage-Token: AUTH_tka1a0d192e57746839c1749f238ba5419
>
< HTTP/1.1 200 OK
< Content-Length: 12
< X-Account-Object-Count: 0
< X-Account-Storage-Policy-Policy-0-Bytes-Used: 0
< X-Account-Storage-Policy-Policy-0-Container-Count: 1
< X-Timestamp: 1484824088.30547
< X-Account-Storage-Policy-Policy-0-Object-Count: 0
< X-Account-Bytes-Used: 0
< X-Account-Container-Count: 1
< Content-Type: text/plain; charset=utf-8
< Accept-Ranges: bytes
< X-Trans-Id: txf57c870a9ba146d9947d0-005881a5cd
< X-Openstack-Request-Id: txf57c870a9ba146d9947d0-005881a5cd
< Date: Fri, 20 Jan 2017 05:53:17 GMT
<
mycontainer
* Connection #0 to host 127.0.0.1 left intact

I was blocked here for quite a while simply because the port in the proxy-server.conf is 6202, not 6002. Be careful!!!

If you get something wrong, go to the log file (at /var/log/swift/all.log) and see the error message there.

You can further verify the setup by creating a container and upload/download a file with

1
2
curl -v -H 'X-Storage-Token: AUTH_tka1a0d192e57746839c1749f238ba5419' -X PUT http://127.0.0.1:8080/v1/AUTH_admin/mycontainer
swift -A http://127.0.0.1:8080/auth/v1.0/ -U admin:admin -K admin upload mycontainer some_file

In the last command line, we use swift client instead of curl. There are other subcommand other than upload. There are delete, download, list, post, and stat. To avoid putting the url and user info in every command, one could put the following into .bashrc

1
2
3
export ST_AUTH=http://localhost:8080/auth/v1.0
export ST_USER=admin:admin
export ST_KEY=admin

Start the consistency processes

Swift relies on a few other processes for consistency purposes.

Edit (or create) the /etc/default/rsync file and add the following line:

1
RSYNC_ENABLE=true

Then, edit (or create) the /etc/rsyncd.conf file, adding the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
uid = swift
gid = swift
log file = /var/log/rsyncd.log
pid file = /var/run/rsyncd.pid
[account]
max connections = 25
path = /srv/node/
read only = false
lock file = /var/lock/account.lock
[container]
max connections = 25
path = /srv/node/
read only = false
lock file = /var/lock/container.lock
[object]
max connections = 25
path = /srv/node/
read only = false
lock file = /var/lock/object.lock

Start the rsync service

1
service rsync start

Now, you can start the processes by

1
2
3
swift-init account-replicator start
swift-init container-replicator start
swift-init object-replicator start

One useful way of controlling these processes is by

1
swift-init all start/stop/restart

Setup on multiple nodes

The plan of deployment depends on the hardware. One need to read the part IV of the swift book before that. I found this article very well writen. One can learn the way of deployment for a small cluster.

There are a few points one should keep in mind.

  • Copy the swift.conf file to all nodes;
  • Add the drives across all nodes to the rings;
  • Copy the rings to all nodes;

References

  1. Install a Stand-alone, Multi-node OpenStack Swift Cluster with VirtualBox or VMware Fusion and Vagrant (https://thornelabs.net/2014/07/14/install-a-stand-alone-multi-node-openstack-swift-cluster-with-virtualbox-or-vmware-fusion-and-vagrant.html)
  2. OpenStack 101: How to Setup OpenStack Swift (http://blog.adityapatawari.com/2014/01/openstack-101-how-to-setup-openstack_12.html)
  3. OpenStack Swift (the swift book)

Implementing Your Own Mapreduce Input Format

There are two steps for each input format, namely, getSplit and recordReader. GetSplit transforms the original input data into splits, one split for each mapper. Typically, if a big file is feed into mapreduce, it will be divided into splits of block size (e.g., 64M or 128M). However, if small files are provided, mapreduce will treat each file as a split. RecordReader turn each split into key value pairs, which are used by the mapper later.

In this article, I use several examples to explain how an inputformat be implemented. I’m greatly influenced by a few great references, such as this and this.

The first example is from book “Hadoop: The Definitive Guide”. WholeFileInputFormat treats each individual input file as a value. We do not need to implement our own getSplit function. Once the isSplitable function returns false, no file will be divided. Each file is treated as a split for the mapper.

1
2
3
4
5
6
7
8
9
10
11
12
13
public class WholeFileInputFormat extends FileInputFormat<NullWritable, BytesWritable> {
@Override
protected boolean isSplitable(JobContext context, Path filename) {
return false;
}

@Override
public RecordReader<NullWritable, BytesWritable> createRecordReader(InputSplit inputSplit, TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
WholeFileRecordReader reader = new WholeFileRecordReader();
reader.initialize(inputSplit, taskAttemptContext);
return reader;
}
}

Since each file is a split, the inputSplit of the record reader is actually a FileSplit. In the example code below, each returned value is the whole content of the file. The key is null in this example, which can also be the name of the file obtained from the fileSplit.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
public class WholeFileRecordReader extends RecordReader<NullWritable, BytesWritable> {
private FileSplit fileSplit;
private Configuration conf;
private BytesWritable value = new BytesWritable();
private boolean processed = false;

public void initialize(InputSplit inputSplit, TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
this.fileSplit = (FileSplit) inputSplit;
this.conf = taskAttemptContext.getConfiguration();
}

public boolean nextKeyValue() throws IOException, InterruptedException {
if(!processed) {
byte[] contents = new byte[(int)fileSplit.getLength()];
Path file = fileSplit.getPath();
FileSystem fs = file.getFileSystem(conf);
FSDataInputStream in = null;
try {
in = fs.open(file);
IOUtils.readFully(in, contents, 0, contents.length);
value.set(contents, 0, contents.length);
} finally {
IOUtils.closeStream(in);
}
processed = true;
return true;
}
return false;
}

public NullWritable getCurrentKey() throws IOException, InterruptedException {
return NullWritable.get();
}

public BytesWritable getCurrentValue() throws IOException, InterruptedException {
return value;
}

public float getProgress() throws IOException, InterruptedException {
return processed? 1.0f : 0.0f;
}

public void close() throws IOException {
//do nothing
}
}

I also think the TeraInputFormat a good example. The key difference from the previous example lies in the input file format of terasort. According to the document, each input element is a 100 byte array, with the first 10 bytes as key and the left 90 bytes as value. This raises a challenge for dividing the input file into splits with exactly multiple of 100 bytes. In the code below, the getSplits simply invokes super.getSplits, ignoring the format issue. Then, in the record reader, the offset is adjusted to the next multiple of 100.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
public class TeraInputFormat extends FileInputFormat<Text, Text> {
static final String PARTITION_FILENAME = "_partition.lst";
private static final String NUM_PARTITIONS = "mapreduce.terasort.num.partitions";
private static final String SAMPLE_SIZE = "mapreduce.terasort.partitions.sample";
static final int KEY_LENGTH = 10;
static final int VALUE_LENGTH = 90;
static final int RECORD_LENGTH = 100;
private static MRJobConfig lastContext = null;
private static List<InputSplit> lastResult = null;

public TeraInputFormat() {
}

public RecordReader<Text, Text> createRecordReader(InputSplit split, TaskAttemptContext context) throws IOException {
return new TeraInputFormat.TeraRecordReader();
}

public List<InputSplit> getSplits(JobContext job) throws IOException {
if(job == lastContext) {
return lastResult;
} else {
long t1 = System.currentTimeMillis();
lastContext = job;
lastResult = super.getSplits(job);
long t2 = System.currentTimeMillis();
System.out.println("Spent " + (t2 - t1) + "ms computing base-splits.");
if(job.getConfiguration().getBoolean(TeraScheduler.USE, true)) {
TeraScheduler scheduler = new TeraScheduler((FileSplit[])lastResult.toArray(new FileSplit[0]), job.getConfiguration());
lastResult = scheduler.getNewFileSplits();
long t3 = System.currentTimeMillis();
System.out.println("Spent " + (t3 - t2) + "ms computing TeraScheduler splits.");
}

return lastResult;
}
}

static class TeraRecordReader extends RecordReader<Text, Text> {
private FSDataInputStream in;
private long offset;
private long length;
private static final int RECORD_LENGTH = 100;
private byte[] buffer = new byte[100];
private Text key;
private Text value;

public TeraRecordReader() throws IOException {
}

public void initialize(InputSplit split, TaskAttemptContext context) throws IOException, InterruptedException {
Path p = ((FileSplit)split).getPath();
FileSystem fs = p.getFileSystem(context.getConfiguration());
this.in = fs.open(p);
long start = ((FileSplit)split).getStart();
this.offset = (100L - start % 100L) % 100L;
this.in.seek(start + this.offset);
this.length = ((FileSplit)split).getLength();
}

public void close() throws IOException {
this.in.close();
}

public Text getCurrentKey() {
return this.key;
}

public Text getCurrentValue() {
return this.value;
}

public float getProgress() throws IOException {
return (float)this.offset / (float)this.length;
}

public boolean nextKeyValue() throws IOException {
if(this.offset >= this.length) {
return false;
} else {
long newRead;
for(int read = 0; read < 100; read = (int)((long)read + newRead)) {
newRead = (long)this.in.read(this.buffer, read, 100 - read);
if(newRead == -1L) {
if(read == 0) {
return false;
}

throw new EOFException("read past eof");
}
}

if(this.key == null) {
this.key = new Text();
}

if(this.value == null) {
this.value = new Text();
}

this.key.set(this.buffer, 0, 10);
this.value.set(this.buffer, 10, 90);
this.offset += 100L;
return true;
}
}
}
}

One of the most common reason of customizing your own input format is for combining small files into fewer bigger ones. For this purpose, we still need to disable the split of files. Note that the implemented CombineWholeFileInputFormat extends CombineFileInputFormat which helps to turn the input split into a combine file split (the size of each split could be tuned) which may contains multiple small files. Each individual file could be retrieved with an index as shown in the record reader below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
public class CombineWholeFileInputFormat extends CombineFileInputFormat<Text, BytesWritable> {
@Override
protected boolean isSplitable(JobContext context, Path filename) {
return false;
}

@Override
public RecordReader<Text, BytesWritable> createRecordReader(InputSplit inputSplit, TaskAttemptContext taskAttemptContext) throws IOException {
return new CombineFileRecordReader<Text, BytesWritable>(
(CombineFileSplit)inputSplit,
taskAttemptContext,
CombineWholeFileRecordReader.class
);
}
}

public class CombineWholeFileRecordReader extends RecordReader<Text, BytesWritable> {
private WholeFileRecordReader reader;
private String fileName;
public CombineWholeFileRecordReader(CombineFileSplit split, TaskAttemptContext context, Integer index) throws IOException, InterruptedException {
FileSplit fileSplit = new FileSplit(split.getPath(index),
split.getOffset(index), split.getLength(index),
split.getLocations());

fileName = fileSplit.getPath().toString();

reader = new WholeFileRecordReader();
reader.initialize(fileSplit, context);
}

public void initialize(InputSplit inputSplit, TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {

}

public boolean nextKeyValue() throws IOException, InterruptedException {
return reader.nextKeyValue();
}

public Text getCurrentKey() throws IOException, InterruptedException {
return new Text(fileName);
}

public BytesWritable getCurrentValue() throws IOException, InterruptedException {
return reader.getCurrentValue();
}

public float getProgress() throws IOException, InterruptedException {
return reader.getProgress();
}

public void close() throws IOException {
reader.close();
}
}

Summary

This article demonstrated a few sample implementations of input format for a mapreduce job. The basic idea is first divide the whole input data into splits and then read each split with a record reader. Usually, there are already plenty of base input formats that can be leveraged for customizing into more sophisticated ones.

Questions Asked When Applying AI

Artificial Intelligence (AI) or Human Intelligence (HI)?

It is not always obvious to determine using AI or HI. If you feel confident to have sufficient data to train sophisticate models, you may prefer using AI. Otherwise, using HI based rules is a more straight way tackling the problems. In practice, a hybrid solution combining both AI and HI is usually a wise choice.

Classification or Regression?

It all depends on the output of the problem. If the outcome is continuous, use regression, otherwise classification. In the digital world, nothing is really continuous. So, if there are too many categories, use regression is a wiser choice.

Do I need preprocessing?

It is rare case that no preprocessing is required. Data usually contains a lot of noise. If the raw data is feed into the learning algorithm, unpredictable outcome may be obtained. Filtering the extraneous data before doing the learning, and most important, also apply the filter before applying the algorithm in your real system. Once you encounter an extraneous event, use HI to determine the output to your end user.

Complex or Less complex?

In the world of learning algorithm, the complexity of the problem is important to choose to apply a “legacy” algorithms like random forest, SVM, or to apply a “cutting-edge” algorithm like CNN. Though DNN beats the smartest people in some areas, it is still quite big for some applications, e.g., activity classification on smartphones. Also, keep in mind that those super powerful algorithms need tons of data as learning input. So, before tasting the cutting-edge techniques, be sure to prepare enough “food” for them.

How to avoid over-fitting?

The cue comes from cross validation. Depending on how much data you have, you make take out 30% data for testing, and the left 70% for training. Or, you may split the whole data set as 10 splits. You take out 1 split for testing and the left 9 splits for training. If you have only few data, use leave one out cross validation.

What’s the cost?

Learning algorithms typically treat all output result neutrally, which however is rare in practice. So, be sure to consider about your cost function, and apply it to the learning algorithm. There are a few ways of applying a cost function given a learning algorithm as a blackbox.

How domain-knowledge comes in?

In general, domain knowledge belongs to HI. Here, I simply want to emphasis that domain knowledge may come in multiple ways. For example, the output of a classification algorithm may be smoothed with a HMM smoother powered by domain knowledge. Also, choosing the features requires deep understanding about the problem to solve. Make a stretch algorithm framework for the problem and then feed more and more domain knowledge in brain to make it better.

Finally, I want to put a picture of the architecture that I designed for transportation activity detection on smartphones.

Toolbox

Particle Filtering

Particle filtering is essentially a heuristic continious search process. Each particle represents a probe of the whole search space. Thus, the number of particles represents the size of the search space. There are a few key steps in the search process: initialization, update weight, andd resampling. Each of these steps can be heuristically tuned following the physic world rules. A good example of applying particle filtering can be found in “Robust Indoor Localization on a Commercial Smart-Phone”, CMU TR 2011. In the paper, they adopted Importance Resampling which is a typical approach by selecting these important particles each iteration. Alternatively, another popular approach is to learn the distribution of the particles and re-generate them based on the learnt model. For example, we assume the particles follow a Normal distribution (learnt based on mean and stddev of their positions), and the observation implies another Normal distribution (pre-knowledge), we can then use a joint distribution to generate the new batch of particles. Two must read papers are Magicol and this one.

Forward Error Correction

The purpose of FEC is to ease the correction of error packets in a packet-based network. Though there are various ways to accomplish this goal, a easy-to-follow paper is “Effective erasure codes for reliable computer communication protocols”, Sigcomm Comput. Commun. Rev. 1997. The key idea is to treat N packets as N unknowns, and generate M (M greater than or equal N) un-collinear equations such that any N equations (among M) can derive the original N packets. There is actually no inherent difference between FEC and network coding. The key idea of network coding is to exploit the broadcast nature of wireless communication to save as much bandwidth as possible. Dina Katabi has been working on this back to a few years ago, which could be found here.

Periodicity Detection

There are basically two mechanisms: short-time FFT and auto-correlation. A very easy-to-follow paper “On Periodicity Detection and Structural Periodic Similarity” SDM’05. The key idea of that paper is to fuse them for robustness. The paper is written in an interesting fashion. This paper also gives a high-level description on FFT.

Periodicity detection is actually independent to the sensor. However, with different sensors, the specific approach may vary a little bit. A very good reference using RGB camera sensor is “Real-Time Periodic Motion Detection, Analysis, and Applications” CVPR 2000. The approach in the reference paper is quite intuitive and easy to follow.

Though detecting periodic activities has been well studied, it is still challenging to achieve online counting. A good practice for counting is based on a state machine with two states: (1) periodicity detection: where frequency analysis (FFT) is applied to a sliding window and peak energy ratio to tell if a periodic activity is observed; (2) edge detection: the rising/falling edges are counted. There are a few important thresholds. First, the peak energy ratio which describes how “higher” the energy of the peak should be above the ambient. Second, the threshold used to detemine if there’s a rising/falling edge. These choices require definitely domain knowledge and should use real data to learn them.

Local Regression

Regression is used to predict the output of a system based on existing observations. Local regression is thus used to account for locality which is a common phenomena in many physic world scenarios. Andrew Ng’s CS229 lecture notes is a good tutorial for understanding this concept. One example of applying local regression is in the Modellet paper, as well as in “Refining WI-FI Based Indoor Positioning” ISCAICT, 2010. The key technology for avoiding overfitting is leave-one-out cross-validation (LOOCV). The key idea of cross-validation is to ensure good performance on an independent test set. More details can be found in reference paper.

Modulation

A easy-to-follow guide to modulation could be found in here.

Rolling Shutter

One widely used exposure control is called rolling shutter where each row has a different exposure starting time. Therefore, if the scene is varying frequently, the captured data may vary from row to row. One brief introduction of rolling shutter can be found in “Energy characterization and optimization of image sensing toward continuous mobile vision” Mobisys’13. The exposure time for each row is exactly the exposure time of the camera which depends on the lighting condition. The difference of the starting times between adjacent rows is called row readout time, which depends on the hardware. Then, to analyse a shining light signal in the frequency domain, the sampling interval is deemed to be the readout time, and the sampling duration is the exposure time. A special case is that if the sampling duration is the multiple of the signal period, the signal is undetectable. Note that the exposure time doesn’t affect the highest detectable frequency, however, the longer the sampling duration, the lower the SNR.

Using rolling shutter to detect a shinning LED light source is feasible in theory. However, due to factors like imperfect sensor, exposure time, and background interference, the SNR is typically low. In common office condition, the recommended illumination level is 300 to 500 Lux. Therefore, it is impractical to apply this technique under such conditions.

IMU Sensor Fusion

Modern mobile devices ship with low-cost IMU sensors, i.e., accelerometer, gyroscope, and compass. Ideally, they can be used to track the orientation of the rigid body of the device. However, suffering from the drift issue, the sensor outputs cannot be directly used. A common way of dealing with such problems is fusing the readings from various sensors. This in general belongs to the field of Mechatronics and Robotics, and thus the derivation is quite complicated. Fortunately, we can leverage existing work described in the paper “An efficient orientation filter for inertial and inertial/magnetic sensor arrays” by Sebastian Madgwick in 2010. His code is distributed publicly and used by a lot of open source projects like this and this is a port of the algorithm in C.

Recently, I found this place does a very good job explaining the key points on sensor fusion, and a copy in pdf is here.

Random Sample Consensus

Random Sample Consensus is usually referred to as RANSAC, which is an iterative method to estimate parameters of a model from a set of observations. The key idea is very straightforward. The algorithm starts by randomly selecting a subset of all observations to train the unknown parameters of the model. Then, it tests the remain observations and count how many of them fits the model. If the ratio is higher than a threshold, the model is considered acceptable. Otherwise, the model is rejected. The algorithm goes through a finite number of iterations to find the optimal model with the highest consensus ratio. There are two key parameters, namely thresholds of consensus and acceptable model. In order to get a good estimation, the user should have solid domain knowledge of the distribution of the data and outliers. Examples and implementation in C++ could be found at here and here.

One may wounder why not using all data to train the model. The answer is that, sometimes, it is unnecessary to use all data. One example is when you want to calculate the transform matrix from one point cloud to another.

Diff Algorithm

Diff is an important algorithm in various applications such as finding the difference in two code segments. There are various ways of doing this where a representive one is described in the paper “An O(ND) Difference Algorithm and Its Variations”. One salient point of this paper is the formulation of the problem using Edit Graph. A C# implementation and discussions cound be found here. A nicer description can be found here.

Kernel Density Estimation (KDE)

Kernel density estimation (KDE) is a technique typically used to smooth the discrete density function, or more precisely the histogram, learnt from a finite set of samples. For instance, we observed 10 samples of the RSSI of a Wi-Fi access point, ranging from -40dB to -90dB. Then, we can derive the histogram distributing the samples in to 10dB sized bins. However, due to the limited number of samples, we may result in a coarse-grained shape. KDE is used to deal with such situations. The key idea is to “leak” some probability density into neighboring bins. To achieve this goal, KDE involves a kernel, commonly uniform, triangular, bi-weight, normal, or others, and apply it across the histogram. An intuitive example could be found on the Wikipedia with normal kernel. Thus, to perform KDE, one needs to choose an appropriate kernel, as well as a bandwidth for the kernel. Generally, KDE is an empirical method to make the discrete probability density function more reasonable. However, one definitely need to understand the domain knowledge to ensure that KDE really makes some sense.

Apriori Algorithm

Apriori is a classic algorithm for mining association rules in a large dataset. One famous application of Apriori is to mine common merchandise itemset bought by customers. There are two key metric in Apriori, i.e., support and confidence. Support quantify the number of transactions containing the itemset of interest. For instance, there are a total of 1000 transactions where 100 of them contains both bread and butter, then the support ratio is 100/1000=0.1. Another important metric is the confidence which measures the reasoning confidence. Use the same example. If 200 transactions contains bread and only 100 of them have butter, then the confidence is 100/200. In other words, we have 50% confidence to say that if someone buy break, she will also buy butter. Before mining the dataset, one need to specify these two metrics. Typically, confidence is set to a very high value like 99% to ensure the derived rules are significant and meaningful. The ratio of support depends heavily to the domain knowledge. In the market transaction example, support needs to be high to motivate the rearrange the goods being aware of the cost. However, in some cases, we focus more on the confidence, where the support could be low.

The algorithm works in a bottom up way. Apriori first finds out all one-item sets with support higher than the chosen metric. Then it elaborate all combinations of one-item sets to form two-item sets. The process keeps on until no more higher level set satisfies the support metric. Then, Aporiori applies the confidence metric to filter out invalid itemsets. Apriori is with many implementations which can be found easily.

阳明先生的哲学

王守仁,幼名云,字伯安,浙江余姚人。初次接触阳明先生是通过《明朝那些事》,作者当年明月称其为“一个高尚的人,一个纯粹的人,一个有道德的人,一个脱离了低级趣味的人,一个有益于人民的人”。阳明先生对后世的影响应首推其哲学,当时称之为“心学”,可惜《明朝那些事》里讲得更多的是他的事迹,虽然提到“知行合一”但没有解释清楚其内在含义。所以虽然读了《明朝那些事》,阳明先生的思想在我看来还是雾里看花。

阳明先生虽然在哲学界享有盛名,但其知名度并不高。这是个很奇怪的现象,其原因可能有二:其一,阳明先生似乎并不喜欢著书,最为著名的《传习录》也由其弟子收集整理,类似于《论语》。其实大部头的书籍并不符合他的性格,他崇尚简洁,其倡导的观念归纳起来寥寥数句而已,何须长篇累牍。其二,阳明先生所推崇的思想在当代看来是赤裸裸的唯心主义。在这“唯物主义”的意识形态下,这纯粹是封建社会糟粕,不禁止就不错了,怎会大张旗鼓地宣传?当然,这或许也与某些人十分崇拜他有关吧。

我对阳明先生的肤浅理解其实仅仅来源于《传习录》。个人拙见,其思想的根源可以归纳为“人心即天地”,《传习录》有:

‘心犹镜也。圣人心如明镜。常人心如昏镜。近世格物之说,如以镜照物,照上用功。不知镜尚昏在,何能照?先生之格物,如磨镜而使之明。磨上用功。明了后亦未尝废照’

此类语句在《传习录》中还有很多。

所以,人们真正该用功的地方乃是磨练自己的内心。“知行合一”或“天泉桥话别”为阳明先生解释如何磨练人心的具体方法。其中“知行合一”被许多人认为是阳明先生的思想核心,我认为其原因是:这句话说起来朗朗上口容易引起“共鸣”,没有真正理解其意义的读者往往以为类似“行胜于言”或“言行合一”,其实差之毫厘谬以千里。

明白了阳明先生的哲学,就清楚人的成长没有捷径。人成长的过程即人心接受历练的过程,不经历诸多繁杂事务决不能“动心忍性增益其所不能”。同时,正由于人心即天地,“磨镜而使之明”就成为了唯一和终极的人生目标,看清这一点,才会明白什么成长过程中遇到的障碍不过“磨镜”的石头而已,应敞开胸怀地迎接,而不是畏首畏尾地逃避。

2016-05-01:理解了“知行合一”,就明白为什么说评价一个人不要看他说过什么,而要看他做过什么。他说出来的话不一定代表他最真实的想法,而做的事情必然是最符合他的内心的。

Install Octave on OSX EI Capitan 10.11.3

Installing Octave on the latest OSX can be error prone. One may download the Octave-Forge bundle from sourceforge here which however is large and also a bit out of date.

Alternatively, if you use homebrew to do this, please be ware of
a few traps. Here are the steps I adopted to successfully install Octave.
If you do not have Xcode or Homebrew installed yet, please refer to Google
to get them installed properly.

1. Preliminaries

  • Install XQuartz from here.
  • Install gcc with (this is for gfortran).
1
$ brew install gcc

It may take quite some time to install gcc from source. If you cannot wait, do xcode-select –install before install gcc. This will have mac install a pre-compiled version which is very fast.

2. Import the scientific computing packages with

1
2
$ brew update && brew upgrade
$ brew tap homebrew/science

If you see any warnings, run

1
$ brew doctor

and follow any suggestions to fix the problem. And then re-import the packages as follows

1
2
$ brew untap homebrew/science
$ brew tap homebrew/science

3. Install Octave

1
$ brew install octave --without-docs

The option –without-docs above is to suppress errors due to missing Tex installation.

4. Install gnuplot

As gnuplot will automatically be installed with octave, but without support for X11. So we need to reinstall it properly.

1
2
$ brew uninstall gnuplot
$ brew install gnuplot --with-x

To me, I still got the following warnings after the steps above.

1
2
warning: ft_render: unable to load appropriate font
warning: could not match any font: *-normal-normal-10

It can be fixed by following this stack overflow post. Simply put it here where you should add this line into your ~/.bash_profile.

1
export FONTCONFIG_PATH=/opt/X11/lib/X11/fontconfig

And run

1
$ source ~/.bash_profile

to reload the config within your terminal. Alternatively, you could restart your teminal.

5. Configurations

Put the following configurations into ~/.octaverc. If there’s no such file, just create one yourself.

1
2
3
4
setenv ("GNUTERM", "X11")

# optional if you are in favor of a more elegant prompt.
PS1('❯❯ ')