Rage Against The Finite-State Machines
What Are They?
Trong chương tiếp theo này chúng ta sẽ tìm hiểu về máy trạng thái finite-state machine (FSM), vậy nó thực sự là gì ? máy trạng thái ( FSM ) trong Erlang vể bản chất là một thuật ngữ để chỉ các trạng thái của thứ gì đó thay vì coi nó là một dạng máy ( như máy ảo ). Để hiểu đơn giản về máy trạng thái tôi đã mô phỏng chúng dưới dạng đồ thị, sơ đồ. vd như sơ đồ dưới đây là một minh họa đơn giản của một máy trạng thái cho hành vi của một chú chó. A finite-state machine (FSM) is not really a machine, but it does have a finite number of states. I've always found finite-state machines easier to understand with graphs and diagrams. For example, the following would be a simplistic diagram for a (very dumb) dog as a state machine:

Giả sử một chú chó sẽ có 3 trang thái cơ bản: ngôi, sửa và vẫy đuổi. Các sự kiện hoặc đầu vào khác nhau có thể làm thay đổi trạng thái của nó. nếu một chú chó đang ở trạng thái ngồi, bỗng trông thấy một con sóc chạy qua nó sẽ bắt đầu sủa cho tới khi bạn vuốt ve nó. Tuy nhiên nếu nó đang ngồi và bạn vuốt ve nó thì se chả có gì xảy ra cả. Trong thế giới của Erlang, một chú chó đại diện cho một tiến trình và có thể crash ( thực tế là nó sẽ được khởi động lại bởi supervisor). Còn trong thực tế thì điều này khá vô lý, khi mà một chú có có thể quay lại sau khi bị một chiếc xe cán. Here the dog has 3 states: sitting, barking or wagging its tail. Different events or inputs may force it to change its state. If a dog is calmly sitting and sees a squirrel, it will start barking and won't stop until you pet it again. However, if the dog is sitting and you pet it, we have no idea what might happen. In the Erlang world, the dog could crash (and eventually be restarted by its supervisor). In the real world that would be a freaky event, but your dog would come back after being ran over by a car, so it's not all bad.
Còn đầy là biểu đồ đối chiếu trạng của một chú mèo: Here's a cat's state diagram for a comparison:

Có vẻ như chú mèo này chỉ có duy nhất một trạng thái và không có sự kiện nào có thể thay đổi trạng thái của nó. This cat has a single state, and no event can ever change it.
Việc thực hiện một cat state machine trong Erlang là một điều thú vị và khá đơn giản: Implementing the cat state machine in Erlang is a fun and simple task:
-module(cat_fsm). -export([start/0, event/2]). start() -> spawn(fun() -> dont_give_crap() end). event(Pid, Event) -> Ref = make_ref(), % won't care for monitors here Pid ! {self(), Ref, Event}, receive {Ref, Msg} -> {ok, Msg} after 5000 -> {error, timeout} end. dont_give_crap() -> receive {Pid, Ref, _Msg} -> Pid ! {Ref, meh}; _ -> ok end, io:format("Switching to 'dont_give_crap' state~n"), dont_give_crap().
Bay giờ chúng ta có thể thử kiểm tra đoạn mã trong module mà chúng ta vừa viết, bạn sẽ thấy chú mèo của chúng ta không bao giờ thay đổi gì cả: We can try the module to see that the cat really never gives a crap:
1> c(cat_fsm). {ok,cat_fsm} 2> Cat = cat_fsm:start(). <0.67.0> 3> cat_fsm:event(Cat, pet). Switching to 'dont_give_crap' state {ok,meh} 4> cat_fsm:event(Cat, love). Switching to 'dont_give_crap' state {ok,meh} 5> cat_fsm:event(Cat, cherish). Switching to 'dont_give_crap' state {ok,meh}
tương tự chúng ta sẽ thực hiện cho trạng thái của một chú chó với module dog FSM, ngoài ra sẽ thêm một số trạng thái như chúng ta mô tả phía trên: except more states are available:
-module(dog_fsm). -export([start/0, squirrel/1, pet/1]). start() -> spawn(fun() -> bark() end). squirrel(Pid) -> Pid ! squirrel. pet(Pid) -> Pid ! pet. bark() -> io:format("Dog says: BARK! BARK!~n"), receive pet -> wag_tail(); _ -> io:format("Dog is confused~n"), bark() after 2000 -> bark() end. wag_tail() -> io:format("Dog wags its tail~n"), receive pet -> sit(); _ -> io:format("Dog is confused~n"), wag_tail() after 30000 -> bark() end. sit() -> io:format("Dog is sitting. Gooooood boy!~n"), receive squirrel -> bark(); _ -> io:format("Dog is confused~n"), sit() end.
Tương đối dễ dàng để khớp với mỗi trạng thái và quá trình chuyển đổi với những gì được mô tả ở biểu đồ trạng thái của chú chó phía trên. Dưới đây là cách sử dụng máy trạng thái: It should be relatively simple to match each of the states and transitions to what was on the diagram above. Here's the FSM in use:
6> c(dog_fsm). {ok,dog_fsm} 7> Pid = dog_fsm:start(). Dog says: BARK! BARK! <0.46.0> Dog says: BARK! BARK! Dog says: BARK! BARK! Dog says: BARK! BARK! 8> dog_fsm:pet(Pid). pet Dog wags its tail 9> dog_fsm:pet(Pid). Dog is sitting. Gooooood boy! pet 10> dog_fsm:pet(Pid). Dog is confused pet Dog is sitting. Gooooood boy! 11> dog_fsm:squirrel(Pid). Dog says: BARK! BARK! squirrel Dog says: BARK! BARK! 12> dog_fsm:pet(Pid). Dog wags its tail pet 13> %% wait 30 seconds Dog says: BARK! BARK! Dog says: BARK! BARK! Dog says: BARK! BARK! 13> dog_fsm:pet(Pid). Dog wags its tail pet 14> dog_fsm:pet(Pid). Dog is sitting. Gooooood boy! pet
Để không khỏi rối khi kiểm tra, bạn có thể làm theo lược đồ nếu muốn ( tôi vẫn thường hay làm vây vì nó sẽ đảm bảo việc kiểm tra dễ dàng hơn và không mắc sai sót nào cả). You can follow along with the schema if you want (I usually do, it helps being sure that nothing's wrong).
Như ví dụ trên, có thể nhận thấy rằng bản chất cốt lỗi của máy trạng thái là các tiến trình Erlang. Có rất nhiều thứ mà chúng ta có thể thực hiện: vd như chúng ta có thể
truyền các trạng thái vào trong đối số của các hàm trạng thái theo cùng cách mà chúng ta làm với hàm lặp chính ( main loop ) của server hay thêm các hàm như init
và terminate
, xử lí cập nhật mã, etc.
That's really the core of FSMs implemented as Erlang processes. There are things that could have been done differently: we could have passed state in the arguments of the state functions
in a way similar to what we do with servers' main loop. We could also have added an init
and terminate
functions, handled code updates, etc.
Giữa máy trạng thái của một chú chó và một chú mèo có một số điểm khác nhau đó là các sự kiện trong cat là các sự kiện mang tính đồng bộ ( synchronous ), ngược lại với chú chó là mang tính chất bất đồng bộ ( asynchronous ). Trong một máy trạng thái thực sự, có thể sử dụng cả hai sự kiên này, nhưng từ sự lười biếng thuần túy không rõ ràng, tôi đã chọn biểu diễn đơn giản nhất có thể. Ngoài ra có một số dạng nữa của sự kiện tôi đã không đưa trong trong ví dụ: các sự kiện toàn cục có thể xảy ra ở bất kỳ trạng thái nào. Another difference between the dog and cat FSMs is that the cat's events are synchronous and the dog's events are asynchronous. In a real FSM, both could be used in a mixed manner, but I went for the simplest representation out of pure untapped laziness. There are other forms of event the examples do not show: global events that can happen in any state.
Một ví dụ về một sự kiện xảy ra khi một chú chó đánh hơi thấy mùi thức ăn. Khi sự kiện smell food
được kích hoạt, chú chó sẽ chạy đi tìm kiếm nơi có đồ ăn.
One example of such an event could be when the dog gets a sniff of food. Once the smell food
event is triggered, no matter what state the dog is in, he'd go looking
for the source of food.
Không như trước khi, Sau khi làm quen với OTP chúng ta sẽ không dành quá nhiều thời gian cho việc thực hiện tất cả các máy trạng thái trong bản phác thảo thôi sơ của chúng ta
mà thay vào đó chúng ta sẽ sử dụng ngay hành vi gen_fsm
trong OTP để thực hiện.
Now we won't spend too much time implementing all of this in our 'written-on-a-napkin' FSM. Instead we'll move directly to the gen_fsm
behaviour.
Generic Finite-State Machines
Cách sử dụng hành vi trong gen_fsm
khá giống với những gì chúng ta làm với gen_server
, hay có thể nói gen_fsm
là một phiên bản đặc biệt hơn của gen_server
. Điểm khác biệt lớn nhất giữa hai trạng thái này đó là việc xử lí lời gọi đông bộ và bất đồng bộ qua
hai hàm calls và casts trong gen_server sẽ được thực hiện qua các sự kiện synchronous và asynchronous.
tương tự như trong ví dụ về trạng thái của chú mèo và chú chó kia, chúng ta đã biểu điễn mỗi trạng bằng một hàm. Một lần nữa, chúng ta sẽ liệt kê các
hàm callback cân thực hiện trong module của chúng ta như cách mà chúng ta làm với gen_server.
The gen_fsm
behaviour is somewhat similar to gen_server
in that it is a specialised version of it.
The biggest difference is that rather than handling calls and casts, we're handling synchronous and asynchronous events.
Much like our dog and cat examples, each state is represented by a function. Again, we'll go through the callbacks our modules need to implement in order to work.
init
init/1 tương tự giống với hàm init sử dụng trong gen_server,
chỉ khác biệt duy nhất đó là kết quả trả về trong gen_fsm có thay đổi một chút, chúng sẽ có dạng {ok, StateName, Data}
, {ok, StateName, Data, Timeout}
,
{ok, StateName, Data, hibernate}
, {stop, Reason}
. Còn lại ý nghĩa sử dụng của stop
, hibernate
and Timeout
tượng tự như gen_server
.
This is the same init/1 as used for generic servers,
except the return values accepted are {ok, StateName, Data}
, {ok, StateName, Data, Timeout}
,
{ok, StateName, Data, hibernate}
and {stop, Reason}
. The stop
tuple works in the same manner as for gen_server
s,
and hibernate
and Timeout keep the same semantics.
Vậy sự khác biệt trong kết quả trả về với biến StateName ở đây là gì ? Nó chỉ đơn giản là một atom dùng để biểu diễn hàm callback hay nói một cách dễ hiểu là trạng thái tiếp theo được gọi tới. What's new here is that StateName variable. StateName is an atom and represents the next callback function to be called.

StateName
Nhìn vào tài liệu bạn sẽ thấy các hàm StateName/2 và
StateName/3 được định nghĩa trong gen_fsm,
về bản chất chúng chỉ là một tên khái quát chung thôi và bạn có thể tùy ý quyết định việc thay thế tên của chúng. Để hiểu rõ
Chúng ta sẽ giả sử rằng hàm init/1
sẽ trả về một bộ có giá trị {ok, sitting, dog}
.
Khi đó máy trạng thái của chúng ta sẽ hiển thị trạng thái là sitting
, đây không phải cùng lại trạng thái mà chúng ta đã thấy trong gen_server
trước đó, nó gần với trạng thái sit
, bark
và wag_tail
mà chúng ta đã làm ở ví dụ máy trạng thái mô tả hành vi của một chú chó hơn.
Các trạng thái này sẽ đưa ra một ngữ cảnh mà bạn sẽ xử lí bằng một sự kiện cụ thể.
The functions StateName/2 and
StateName/3
are placeholder names and you are to decide what they will be. Let's suppose the init/1
function returns the tuple {ok, sitting, dog}
.
This means the finite state machine will be in a sitting
state. This is not the same kind of state as we had seen with gen_server
;
it is rather equivalent to the sit
, bark
and wag_tail
states of the previous dog FSM. These states dictate a context in which you handle a
given event.
Một ví dụ nữa đó là trường hợp khi một ai đó gọi điện thoại cho bạn. Nếu đang đang trong trạng thái 'ngáy khò khò vào sáng thứ 7', bạn có thể phản ứng lại cuộc điẹn thoại đó bằng cách hét vào nó. nhưng nếu đó là một cuộc gọi trong trang thái 'chờ đợi một cuộc phỏng vấn cho công việc', thì ngay lập tức bạn sẽ thức dậy cầm lấy điện thoại và lịch sử trả lời. Trái lại, nếu bạn trong trạng thái 'chết' rồi thì tôi thật sự ngạc nhiên khi bạn có thẻ đọc hết được tất các các dòng chữ này. An example of this would be someone calling you on your phone. If you're in the state 'sleeping on a Saturday morning', your reaction might be to yell in the phone. If your state is 'waiting for a job interview', chances are you'll pick the phone and answer politely. On the other hand, if you're in the state 'dead', then I am surprised you can even read this text at all.
quay lại với máy trạng thái, hàm init/1
sẽ đưa cho chúng ta trạng thái lên là sitting
. Và bất cứ khi nào
tiến trình trong gen_fsm
nhận được một sự kiện, nó sẽ gọi tới hàm sitting/2
hoặc sitting/3
,
trong đó đó hàm sitting/2
se dùng với lời gọi bất đồng bộ còn hàm sitting/3
sẽ sử dụng nếu như bạn muốm đồng bộ lời gọi.
Back to our FSM. The init/1
function said we should be in the sitting
state. Whenever the gen_fsm
process receives an event,
either the function sitting/2
or sitting/3
will be called. The sitting/2
function is called for asynchronous events and sitting/3
for synchronous ones.
Tham số trong sitting/2
( hay nói chung chung là tham số trong StateName/2
) sẽ là các biến Event, hay thực tế các tin nhắc
sẽ được coi như một sự kiện được gửi đi và biến StateData là dữ liệu mà sẽ được gửi thông qua lời gọi. Với hàm sitting/2
nó có thể
trả về kết qúa giá trị như sau {next_state, NextStateName, NewStateData}
, {next_state, NextStateName, NewStateData, Timeout}
, {next_state, NextStateName, NewStateData, hibernate}
,
và {stop, Reason, NewStateData}
.
The arguments for sitting/2
(or generally StateName/2
) are Event, the actual message sent as an event, and StateData,
the data that was carried over the calls. sitting/2
can then return the tuples {next_state, NextStateName, NewStateData}
,
{next_state, NextStateName, NewStateData, Timeout}
, {next_state, NextStateName, NewStateData, hibernate}
and {stop, Reason, NewStateData}
.
Tham số trong sitting/3
tương tự như trong sitting/2
, Ngoài ra có một biến From sẽ được thêm vào giữa hai biến
Event and StateData, và biến này không có gì lạ bởi nó sẽ được sử dụng tương tự như cách mà chúng ta sử dụng biến này trong
gen_server
, bao gôm cả hàm gen_fsm:reply/2.
dưới đây là danh sách các kết quả mà hàm StateName/3
có thể trả về:
The arguments for sitting/3
are similar, except there is a From variable in between Event and StateData.
The From variable is used in exactly the same way as it was for gen_server
s, including
gen_fsm:reply/2. The StateName/3
functions can return the following tuples:
{reply, Reply, NextStateName, NewStateData} {reply, Reply, NextStateName, NewStateData, Timeout} {reply, Reply, NextStateName, NewStateData, hibernate} {next_state, NextStateName, NewStateData} {next_state, NextStateName, NewStateData, Timeout} {next_state, NextStateName, NewStateData, hibernate} {stop, Reason, Reply, NewStateData} {stop, Reason, NewStateData}
Chú ý là không có giới hạn số lượng hàm trạng thái mà bạn có thể định nghĩa miến sao là chút được công khai qua export. các giá trị kết qủa atom trả về có dạng NextStateName trong bộ dữ liệu được sử dụng để xác định các hàm hay trạng thái tiếp theo sẽ được gọi hoặc là nếu không thay đổi bạn có thể giữ nguyên chúng. Note that there's no limit on how many of these functions you can have, as long as they are exported. The atoms returned as NextStateName in the tuples will determine whether the function will be called or not.
handle_event
Trong phần trước, tôi đã nhắc đến các sự kiện toàn cục, đây là các các sự kiện sẽ kích hoạt một phản ứng nhất định mà không phụ thuộc
vào trạng thái hiện tại (vd đã đề cập qua việc một chú cho khi đánh hơi thấy mùi thức ăn sẽ ngay lập tức mọi hoạt động nó đang làm và chạy tìm
tìm kiếm nơi có thức ăn ngay ). Đối với các sự kiện như vậy nó lên được coi như nhau ở mỗi trạng thái và bạn sử dụng hàm
callback handle_event/3 để kích hoạt các sự kiện như vậy.
hàm này sẽ chấp nhận các tham số tương tự như hàm StateName/2
với ngoại lệ việc chấp nhận thêm một biến StateName giữa chúng,
biến này sẽ hiển thị trạng thái của sự kiện đã nhận được. kế quả trả về của hàm này tương tự như kết quả khi sử dụng hàm StateName/2
.
In the last section, I mentioned global events that would trigger a specific reaction no matter what state we're in (the dog smelling food will drop whatever it is doing and will
instead look for food). For these events that should be treated the same way in every state, the
handle_event/3 callback is what you want.
The function takes arguments similar to StateName/2
with the exception that it accepts a StateName variable in between them,
telling you what the state was when the event was received. It returns the same values as StateName/2
.
handle_sync_event
Ham callback handle_sync_event/4 được sử dụng cho hàm
StateName/3
trong khi handle_event/2
sẽ được dùng với hàm StateName/2
. Nó có nhiệm vụ xử lí các sự kiện toàn cục, bằng cách chấp nhận
cùng tham số và trả về bộ dữ liệu cùng loại với hàm StateName/3
.
The handle_sync_event/4 callback is to StateName/3
what handle_event/2
is to StateName/2
. It handles synchronous global events, takes the same parameters and returns the same kind of tuples as StateName/3
.
Bây giờ chúng ta đã có thể giải thích một sự kiện được coi là toàn cục có hay chăng ý nghĩa được gửi tới một trạng thái cụ thể như thế nào, và để xác định nhiều này
chúng ta có thể xét các hàm được sử dụng để gửi một sự kiện tới máy trạng thái . Như chúng ta đã biết đối với các sự kiện bất đồng bộ , hàm
send_event/2 sẽ sư dụng cùng với các hàm có dạng StateName/2
,
ngước lại hàm sync_send_event/2-3 sẽ được sử dụng cùng với các hàm StateName/3
cho cá sự kiện đồng bộ.
Now might be a good time to explain how we know whether an event is global or if it's meant to be sent to a specific state. To determine this we can look at the function used to
send an event to the FSM. Asynchronous events aimed at any StateName/2
function are sent with
send_event/2, synchronous events to be picked up by StateName/3
are to be sent with
sync_send_event/2-3.
Ngoài ra còn có hai hàm có nghĩa tượng tự được sử dụng cho các sự kiện toàn cục chung đó là hàm send_all_state_event/2 và sync_send_all_state_event/2-3 ( một cái tên khá dài ). The two equivalent functions for global events are send_all_state_event/2 and sync_send_all_state_event/2-3 (quite a long name).
code_change
hành vi tiếp theo chúng xét đó là code_change, hàm này về ý nghĩa không có gì thay đổi so với hàm chúng ta đã làm trong gen_server
, tuy nhiên trong gen_fms
nó sẽ chấp nhận thêm một tham số trạng thái nữa có dạng code_change(OldVersion, StateName, Data, Extra)
và trả về bộ dữ liệu có dạng {ok, NextStateName, NewStateData}
.
This works exactly the same as it did for gen_server
s except that it takes an extra state parameter when called like
code_change(OldVersion, StateName, Data, Extra)
, and returns a tuple of the form {ok, NextStateName, NewStateData}
.
terminate
Một lần nữa, hàm này không có sự khác biệt, ý nghĩa và cách sử dụng của nó tương tự như trong gen_server, và cũng tương tự hàm terminate/3
sẽ đối nghịch chức năng của hàm init/1
This should, again, act a bit like what we have for generic servers. terminate/3
should do the opposite of init/1
.
A Trading System Specification
Như vậy chúng ta đã xét qua các hành vi trong gen_fsm rồi, giờ sẽ là lúc đi vào luyện tập. Nếu bạn tìm kiếm, bạn có thể sẽ thấy có rất nhiều các bài hướng dẫn nói về máy trạng thái cùng với ví dụ mẫu về telephone switches hay tương tự. Theo suy đoán của tôi mà nói thì hầu hết các lập trình viên rất hiếm khi phải làm việc cùng với những thứ như vậy với máy trạng thái. Do đó để gần gũi quen thuộc hơn chúng ta sẽ tìm kiếm một ví dụ khác mà phù hợp hơn đối với các lập trình viên, trong phần này chúng ta sẽ thiết kế và triển khai một hệ thống giao dịch cho một số trò chơi điện tử ( video game ) không tồn tại, hư cấu. It's time to put all of this in practice. Many Erlang tutorials about finite-state machines use examples containing telephone switches and similar things. It's my guess that most programmers will rarely have to deal with telephone switches for state machines. Because of that, we're going to look at an example which is more fitting for many developers: we'll design and implement an item trading system for some fictional and non-existing video game.
với hệ thống mà tôi đã chọn này sẽ có một chút khó khăn được đưa ra. thay vì sử dụng một nhà chung gian để người chơi có thể gửi vật dụng và xác định ( thực tế là sử dụng cách này sẽ dễ dàng hơn ), thì thay vào đó chúng ta sẽ triển khai một server để thông qua đó hai người chơi sẽ trao đổi trực tiếp nhau ( cách làm này phức tạp hơn nhưng sẽ có một lợi thế đối với một hệ thống phân tán ). The design I have picked is somewhat challenging. Rather than using a broker through which players route items and confirmations (which, frankly, would be easier), we're going to implement a server where both players speak to each other directly (which would have the advantage of being distributable).
Bởi vì sự phức tạp của hệ thống, tôi sẽ dành nhiều thời gian để mô tả nó trước khi thực hiện, tôi sẽ đề cập tới các loại vẫn đề mà chúng ta sẽ gặp phải khi triển khai hệ thống này và cách khắc phục chúng. Because the implementation is tricky, I'll spend a good while describing it, the kind of problems to be faced and the ways to fix them.
Đầu tiên, chúng ta cần xác định các hành động, hành vi có thể xảy ra giữa các người choi khi giao dịch với nhau. Khi hai người chơi muốn trao đổi vật dụng với nhau thì việc làm đầu tiên mà họ cần làm đó là yêu cầu để thiết lập một giao dịch. Trong trường hợp đó các người chơi khác có thể chấp nhận hay từ chối giao dịch này, tuy nhiên để đơn giản trong ví dụ này chúng ta sẽ không cho phép họ từ chối giao dịch. Trong trường hợp bạn muốn chức năng này, thì chúng ta có thể thêm vào sau khi hoàn thành hệ thống ban đầu. First of all, we should define the actions that can be done by our players when trading. The first is asking for a trade to be set up. The other user should also be able to accept that trade. We won't give them the right to deny a trade, though, because we want to keep things simple. It will be easy to add this feature once the whole thing is done.
Một khi giao dịch được thiết lập, người chơi của chúng ta có thể thương lượng với nhau. Như vậy có nghĩa là họ có thể đưa ra các đề xuất hay rút lại các đề xuất đó nếu họ muốn. Khi cả hai người chơi hài lòng với đê xuất được đưa ta, đôi bên có thể tuyên bố sẵn sáng của họ để kết thúc giao dịch. Các thông tin, dữ liệu giao dịch lên được lưu trữ trên cả hai phía ở bất kỳ thời điểm nào, bất kỳ cách nào đê bất kỳ người chơi nào cũng có thể hủy toàn bộ giao dịch. Bởi vì sẽ có trường một pleb ( một thuật ngữ để nói tới những người chơi bình thường ) có thể sẽ cung cấp một số vật dụng không đáng tin cậy hay nói cách khác là lừa đảo cho bên kia ( chỉ những người có thể khá bận rộn ), do đó những người chơi họ hoan toàn có thể hủy giao dịch một cách thích đáng. Once the trade is set up, our users should be able to negotiate with each other. This means they should be able to make offers and then retract them if they want. When both players are satisfied with the offer, they can each declare themselves as ready to finalise the trade. The data should then be saved somewhere on both sides. At any point in time, it should also make sense for any of the players to cancel the whole trade. Some pleb could be offering only items deemed unworthy to the other party (who might be very busy) and so it should be possible to backhand them with a well-deserved cancellation.
Tóm lại, các hành động có thể diễn ra giữa các người chơi với nhau trong một giao dịch có thể như sau: In short, the following actions should be possible:
- ask for a trade
- accept a trade
- offer items
- retract an offer
- declare self as ready
- brutally cancel the trade
Bây giờ chúng ta đã xác định được các hành động rồi, vì vậy khí mỗi hành động được thực hiện, máy trạng thái của các người chơi lên nhận thức được các hành động này. Tức là nếu một ai đó tên là Jim ra lệnh với máy trạng thái của anh ta gửi môt vật dụng tới cho người nào đó tên Carl, lúc này máy trạng thái của Carl phải nhận thức được điều này. Hay nói một cách khác đó là cả hai người chơi sẽ giao tiếp, trao đổi với nhau thông qua máy trạng thái của họ và từ đó trao đổi giao tiếp với máy trạng thái của người khác. Nó có thể được mô tả thông qua một lược dồ giống như sau: Now, when each of these actions is taken, the other player's FSM should be made aware of it. This makes sense, because when Jim tells his FSM to send an item to Carl, Carl's FSM has to be made aware of it. This means both players can talk to their own FSM, which will talk to the other's FSM. This gives us something a bit like this:

Điều đầu tiên cần lưu ý đó là khi có hai tiến trình cần giao tiếp The first thing to notice when we have two identical processes communicating with each other is that we have to avoid synchronous calls as much as possible. The reason for this is that if Jim's FSM sends a message to Carl's FSM and then waits for its reply while at the same time Carl's FSM sends a message over to Jim's FSM and waits for its own specific reply, both end up waiting for the other without ever replying. This effectively freezes both FSMs. We have a deadlock.
One solution to this is to wait for a timeout and then move on, but then there will be leftover messages in both processes' mailboxes and the protocol will be messed up. This certainly is a can of worms, and so we want to avoid it.
The simplest way to do it is to avoid all synchronous messages and go fully asynchronous. Note that Jim might still make a synchronous call to his own FSM; there's no risk here because the FSM won't need to call Jim and so no deadlock can occur between them.
When two of these FSMs communicate together, the whole exchange might look a bit like this:

Both FSMs are in an idle state. When you ask Jim to trade, Jim has to accept before things move on. Then both of you can offer items or withdraw them. When you are both declaring yourself ready, the trade can take place. This is a simplified version of all that can happen and we'll see all possible cases with more detail in the next paragraphs.
Here comes the tough part: defining the state diagram and how state transitions happen. Usually a good bit of thinking goes into this, because you have to think of all the small things that could go wrong. Some things might go wrong even after having reviewed it many times. Because of this, I'll simply put the one I decided to implement here and then explain it.

At first, both finite-state machines start in the idle
state. At this point, one thing we can do is ask some other player to negotiate with us:

We go into idle_wait
mode in order to wait for an eventual reply after our FSM forwarded the demand. Once the other FSM sends the reply, ours can switch to negotiate
:

The other player should also be in negotiate
state after this. Obviously, if we can invite the other, the other can invite us. If all goes well, this should end up looking like this:

So this is pretty much the opposite as the two previous state diagrams bundled into one. Note that we expect the player to accept the offer in this case. What happens if by pure luck, we ask the other player to trade with us at the same time he asks us to trade?

What happens here is that both clients ask their own FSM to negotiate with the other one. As soon as the ask negotiate messages are sent, both FSMs switch to idle_wait
state. Then they will be able to process the negotiation question. If we review the previous state diagrams, we see that this combination of events is the only time we'll receive ask negotiate messages while in the idle_wait
state. Consequently, we know that getting these messages in idle_wait
means that we hit the race condition and can assume both users want to talk to each other. We can move both of them to negotiate
state. Hooray.
So now we're negotiating. According to the list of actions I listed earlier, we must support users offering items and then retracting the offer:

All this does is forward our client's message to the other FSM. Both finite-state machines will need to hold a list of items offered by either player, so they can update that list when receiving such messages. We stay in the negotiate
state after this; maybe the other player wants to offer items too:

Here, our FSM basically acts in a similar manner. This is normal. Once we get tired of offering things and think we're generous enough, we have to say we're ready to officialise the trade. Because we have to synchronise both players, we'll have to use an intermediary state, as we did for idle
and idle_wait
:

What we do here is that as soon as our player is ready, our FSM asks Jim's FSM if he's ready. Pending its reply, our own FSM falls into its wait
state. The reply we'll get will depend on Jim's FSM state: if it's in wait
state, it'll tell us that it's ready. Otherwise, it'll tell us that it's not ready yet. That's precisely what our FSM automatically replies to Jim if he asks us if we are ready when in negotiate
state:

Our finite state machine will remain in negotiate
mode until our player says he's ready. Let's assume he did and we're now in the wait
state. However, Jim's not there yet. This means that when we declared ourselves as ready, we'll have asked Jim if he was also ready and his FSM will have replied 'not yet':

He's not ready, but we are. We can't do much but keep waiting. While waiting after Jim, who's still negotiating by the way, it is possible that he will try to send us more items or maybe cancel his previous offers:

Of course, we want to avoid Jim removing all of his items and then clicking "I'm ready!", screwing us over in the process. As soon as he changes the items offered, we go back into the negotiate
state so we can either modify our own offer, or examine the current one and decide we're ready. Rinse and repeat.
At some point, Jim will be ready to finalise the trade too. When this happens, his finite-state machine will ask ours if we are ready:

What our FSM does is reply that we indeed are ready. We stay in the waiting state and refuse to move to the ready
state though. Why is this? Because there's a potential race condition! Imagine that the following sequence of events takes place, without doing this necessary step:

This is a bit complex, so I'll explain. Because of the way messages are received, we could possibly only process the item offer after we declared ourselves ready and also after Jim declared himself as ready. This means that as soon as we read the offer message, we switch back to negotiate
state. During that time, Jim will have told us he is ready. If he were to change states right there and move on to ready
(as illustrated above), he'd be caught waiting indefinitely while we wouldn't know what the hell to do. This could also happen the other way around! Ugh.
One way to solve this is by adding one layer of indirection (Thanks to David Wheeler). This is why we stay in wait
mode and send 'ready!' (as shown in our previous state diagram). Here's how we deal with that 'ready!' message, assuming we were already in the ready
state because we told our FSM we were ready beforehand:

When we receive 'ready!' from the other FSM, we send 'ready!' back again. This is to make sure that we won't have the 'double race condition' mentioned above. This will create a superfluous 'ready!' message in one of the two FSMs, but we'll just have to ignore it in this case. We then send an 'ack' message (and the Jim's FSM will do the same) before moving to ready
state. The reason why this 'ack' message exists is due to some implementation details about synchronising clients. I've put it in the diagram for the sake of being correct, but I won't explain it until later. Forget about it for now. We finally managed to synchronise both players. Whew.
So now there's the ready
state. This one is a bit special. Both players are ready and have basically given the finite-state machines all the control they need. This lets us implement a bastardized version of a two-phase commit to make sure things go right when making the trade official:

Our version (as described above) will be rather simplistic. Writing a truly correct two-phase commit would require a lot more code than what is necessary for us to understand finite-state machines.
Finally, we only have to allow the trade to be cancelled at any time. This means that somehow, no matter what state we're in, we're going to listen to the 'cancel' message from both sides and quit the transaction. It should also be common courtesy to let the other side know we're gone before leaving.
Alright! It's a whole lot of information to absorb at once. Don't worry if it takes a while to fully grasp it. It took a bunch of people to look over my protocol to see if it was right, and even then we all missed a few race conditions that I then caught a few days later when reviewing the code while writing this text. It's normal to need to read it more than once, especially if you are not used to asynchronous protocols. If this is the case, I fully encourage you to try and design your own protocol. Then ask yourself "what happens if two people do the same actions very fast? What if they chain two other events quickly? What do I do with messages I don't handle when changing states?" You'll see that the complexity grows real fast. You might find a solution similar to mine, possibly a better one (let me know if this is the case!) No matter the outcome, it's a very interesting thing to work on and our FSMs are still relatively simple.
Once you've digested all of this (or before, if you're a rebel reader), you can go to the next section, where we implement the gaming system. For now you can take a nice coffee break if you feel like doing so.

Game trading between two players
The first thing that needs to be done to implement our protocol with OTP's gen_fsm
is to create the interface. There will be 3 callers for our module: the player, the gen_fsm
behaviour and the other player's FSM. We will only need to export the player function and gen_fsm
functions, though. This is because the other FSM will also run within the trade_fsm module and can access them from the inside:
-module(trade_fsm). -behaviour(gen_fsm). %% public API -export([start/1, start_link/1, trade/2, accept_trade/1, make_offer/2, retract_offer/2, ready/1, cancel/1]). %% gen_fsm callbacks -export([init/1, handle_event/3, handle_sync_event/4, handle_info/3, terminate/3, code_change/4, % custom state names idle/2, idle/3, idle_wait/2, idle_wait/3, negotiate/2, negotiate/3, wait/2, ready/2, ready/3]).
So that's our API. You can see I'm planning on having some functions being both synchronous and asynchronous. This is mostly because we want our client to call us synchronously in some cases, but the other FSM can do it asynchronously. Having the client synchronous simplifies our logic a whole lot by limiting the number of contradicting messages that can be sent one after the other. We'll get there. Let's first implement the actual public API according to the protocol defined above:
%%% PUBLIC API start(Name) -> gen_fsm:start(?MODULE, [Name], []). start_link(Name) -> gen_fsm:start_link(?MODULE, [Name], []). %% ask for a begin session. Returns when/if the other accepts trade(OwnPid, OtherPid) -> gen_fsm:sync_send_event(OwnPid, {negotiate, OtherPid}, 30000). %% Accept someone's trade offer. accept_trade(OwnPid) -> gen_fsm:sync_send_event(OwnPid, accept_negotiate). %% Send an item on the table to be traded make_offer(OwnPid, Item) -> gen_fsm:send_event(OwnPid, {make_offer, Item}). %% Cancel trade offer retract_offer(OwnPid, Item) -> gen_fsm:send_event(OwnPid, {retract_offer, Item}). %% Mention that you're ready for a trade. When the other %% player also declares being ready, the trade is done ready(OwnPid) -> gen_fsm:sync_send_event(OwnPid, ready, infinity). %% Cancel the transaction. cancel(OwnPid) -> gen_fsm:sync_send_all_state_event(OwnPid, cancel).
This is rather standard; all these 'gen_fsm' functions have been covered before (except start/3-4 and start_link/3-4 which I believe you can figure out) in this chapter.
Next we'll implement the FSM to FSM functions. The first ones have to do with trade setups, when we first want to ask the other user to join us in a trade:
%% Ask the other FSM's Pid for a trade session ask_negotiate(OtherPid, OwnPid) -> gen_fsm:send_event(OtherPid, {ask_negotiate, OwnPid}). %% Forward the client message accepting the transaction accept_negotiate(OtherPid, OwnPid) -> gen_fsm:send_event(OtherPid, {accept_negotiate, OwnPid}).
The first function asks the other pid if they want to trade, and the second one is used to reply to it (asynchronously, of course).
We can then write the functions to offer and cancel offers. According to our protocol above, this is what they should be like:
%% forward a client's offer do_offer(OtherPid, Item) -> gen_fsm:send_event(OtherPid, {do_offer, Item}). %% forward a client's offer cancellation undo_offer(OtherPid, Item) -> gen_fsm:send_event(OtherPid, {undo_offer, Item}).
So, now that we've got these calls done, we need to focus on the rest. The remaining calls relate to being ready or not and handling the final commit. Again, given our protocol above, we have three calls: are_you_ready
, which can have the replies not_yet
or ready!
:
%% Ask the other side if he's ready to trade. are_you_ready(OtherPid) -> gen_fsm:send_event(OtherPid, are_you_ready). %% Reply that the side is not ready to trade %% i.e. is not in 'wait' state. not_yet(OtherPid) -> gen_fsm:send_event(OtherPid, not_yet). %% Tells the other fsm that the user is currently waiting %% for the ready state. State should transition to 'ready' am_ready(OtherPid) -> gen_fsm:send_event(OtherPid, 'ready!').
The only functions left are those which are to be used by both FSMs when doing the commit in the ready
state. Their precise usage will be described more in detail later, but for now, the names and the sequence/state diagram from earlier should be enough. Nonetheless, you can still transcribe them to your own version of trade_fsm:
%% Acknowledge that the fsm is in a ready state. ack_trans(OtherPid) -> gen_fsm:send_event(OtherPid, ack). %% ask if ready to commit ask_commit(OtherPid) -> gen_fsm:sync_send_event(OtherPid, ask_commit). %% begin the synchronous commit do_commit(OtherPid) -> gen_fsm:sync_send_event(OtherPid, do_commit).
Oh and there's also the courtesy function allowing us to warn the other FSM we cancelled the trade:
notify_cancel(OtherPid) -> gen_fsm:send_all_state_event(OtherPid, cancel).
We can now move to the really interesting part: the gen_fsm
callbacks. The first callback is init/1
. In our case, we'll want each FSM to hold a name for the user it represents (that way, our output will be nicer) in the data it keeps passing on to itself. What else do we want to hold in memory? In our case, we want the other's pid, the items we offer and the items the other offers. We're also going to add the reference of a monitor (so we know to abort if the other dies) and a from
field, used to do delayed replies:
-record(state, {name="", other, ownitems=[], otheritems=[], monitor, from}).
In the case of init/1
, we'll only care about our name for now. Note that we'll begin in the idle
state:
init(Name) -> {ok, idle, #state{name=Name}}.
The next callbacks to consider would be the states themselves. So far I've described the state transitions and calls that can be made, but We'll need a way to make sure everything goes alright. We'll write a few utility functions first:
%% Send players a notice. This could be messages to their clients %% but for our purposes, outputting to the shell is enough. notice(#state{name=N}, Str, Args) -> io:format("~s: "++Str++"~n", [N|Args]). %% Unexpected allows to log unexpected messages unexpected(Msg, State) -> io:format("~p received unknown event ~p while in state ~p~n", [self(), Msg, State]).
And we can start with the idle state. For the sake of convention, I'll cover the asynchronous version first. This one shouldn't need to care for anything but the other player asking for a trade given our own player, if you look at the API functions, will use a synchronous call:
idle({ask_negotiate, OtherPid}, S=#state{}) -> Ref = monitor(process, OtherPid), notice(S, "~p asked for a trade negotiation", [OtherPid]), {next_state, idle_wait, S#state{other=OtherPid, monitor=Ref}}; idle(Event, Data) -> unexpected(Event, idle), {next_state, idle, Data}.

A monitor is set up to allow us to handle the other dying, and its ref is stored in the FSM's data along with the other's pid, before moving to the idle_wait
state. Note that we will report all unexpected events and ignore them by staying in the state we were already in. We can have a few out of band messages here and there that could be the result of race conditions. It's usually safe to ignore them, but we can't easily get rid of them. It's just better not to crash the whole FSM on these unknown, but somewhat expected messages.
When our own client asks the FSM to contact another player for a trade, it will send a synchronous event. The idle/3
callback will be needed:
idle({negotiate, OtherPid}, From, S=#state{}) -> ask_negotiate(OtherPid, self()), notice(S, "asking user ~p for a trade", [OtherPid]), Ref = monitor(process, OtherPid), {next_state, idle_wait, S#state{other=OtherPid, monitor=Ref, from=From}}; idle(Event, _From, Data) -> unexpected(Event, idle), {next_state, idle, Data}.
We proceed in a way similar to the asynchronous version, except we need to actually ask the other side whether they want to negotiate with us or not. You'll notice that we do not reply to the client yet. This is because we have nothing interesting to say, and we want the client locked and waiting for the trade to be accepted before doing anything. The reply will only be sent if the other side accepts once we're in idle_wait
.
When we're there, we have to deal with the other accepting to negotiate and the other asking to negotiate (the result of a race condition, as described in the protocol):
idle_wait({ask_negotiate, OtherPid}, S=#state{other=OtherPid}) -> gen_fsm:reply(S#state.from, ok), notice(S, "starting negotiation", []), {next_state, negotiate, S}; %% The other side has accepted our offer. Move to negotiate state idle_wait({accept_negotiate, OtherPid}, S=#state{other=OtherPid}) -> gen_fsm:reply(S#state.from, ok), notice(S, "starting negotiation", []), {next_state, negotiate, S}; idle_wait(Event, Data) -> unexpected(Event, idle_wait), {next_state, idle_wait, Data}.
This gives us two transitions to the negotiate
state, but remember that we must use gen_fsm:reply/2
reply to our client to tell it it's okay to start offering items. There's also the case of our FSM's client accepting the trade suggested by the other party:
idle_wait(accept_negotiate, _From, S=#state{other=OtherPid}) -> accept_negotiate(OtherPid, self()), notice(S, "accepting negotiation", []), {reply, ok, negotiate, S}; idle_wait(Event, _From, Data) -> unexpected(Event, idle_wait), {next_state, idle_wait, Data}.
Again, this one moves on to the negotiate
state. Here, we must handle asynchronous queries to add and remove items coming both from the client and the other FSM. However, we have not yet decided how to store items. Because I'm somewhat lazy and I assume users won't trade that many items, simple lists will do it for now. However, we might change our mind at a later point, so it would be a good idea to wrap item operations in their own functions. Add the following functions at the bottom of the file with notice/3
and unexpected/2
:
%% adds an item to an item list add(Item, Items) -> [Item | Items]. %% remove an item from an item list remove(Item, Items) -> Items -- [Item].
Simple, but they have the role of isolating the actions (adding and removing items) from their implementation (using lists). We could easily move to proplists, arrays or whatever data structure without disrupting the rest of the code.
Using both of these functions, we can implement offering and removing items:
negotiate({make_offer, Item}, S=#state{ownitems=OwnItems}) -> do_offer(S#state.other, Item), notice(S, "offering ~p", [Item]), {next_state, negotiate, S#state{ownitems=add(Item, OwnItems)}}; %% Own side retracting an item offer negotiate({retract_offer, Item}, S=#state{ownitems=OwnItems}) -> undo_offer(S#state.other, Item), notice(S, "cancelling offer on ~p", [Item]), {next_state, negotiate, S#state{ownitems=remove(Item, OwnItems)}}; %% other side offering an item negotiate({do_offer, Item}, S=#state{otheritems=OtherItems}) -> notice(S, "other player offering ~p", [Item]), {next_state, negotiate, S#state{otheritems=add(Item, OtherItems)}}; %% other side retracting an item offer negotiate({undo_offer, Item}, S=#state{otheritems=OtherItems}) -> notice(S, "Other player cancelling offer on ~p", [Item]), {next_state, negotiate, S#state{otheritems=remove(Item, OtherItems)}};
This is an ugly aspect of using asynchronous messages on both sides. One set of message has the form 'make' and 'retract', while the other has 'do' and 'undo'. This is entirely arbitrary and only used to differentiate between player-to-FSM communications and FSM-to-FSM communications. Note that on those coming from our own player, we have to tell the other side about the changes we're making.
Another responsibility is to handle the are_you_ready
message we mentioned in the protocol. This one is the last asynchronous event to handle in the negotiate
state:
negotiate(are_you_ready, S=#state{other=OtherPid}) -> io:format("Other user ready to trade.~n"), notice(S, "Other user ready to transfer goods:~n" "You get ~p, The other side gets ~p", [S#state.otheritems, S#state.ownitems]), not_yet(OtherPid), {next_state, negotiate, S}; negotiate(Event, Data) -> unexpected(Event, negotiate), {next_state, negotiate, Data}.
As described in the protocol, whenever we're not in the wait
state and receive this message, we must reply with not_yet
. Were also outputting trade details to the user so a decision can be made.
When such a decision is made and the user is ready, the ready
event will be sent. This one should be synchronous because we don't want the user to keep modifying his offer by adding items while claiming he's ready:
negotiate(ready, From, S = #state{other=OtherPid}) -> are_you_ready(OtherPid), notice(S, "asking if ready, waiting", []), {next_state, wait, S#state{from=From}}; negotiate(Event, _From, S) -> unexpected(Event, negotiate), {next_state, negotiate, S}.
At this point a transition to the wait
state should be made. Note that just waiting for the other is not interesting. We save the From variable so we can use it with gen_fsm:reply/2
when we have something to tell to the client.
The wait
state is a funny beast. New items might be offered and retracted because the other user might not be ready. It makes sense, then, to automatically rollback to the negotiating state. It would suck to have great items offered to us, only for the other to remove them and declare himself ready, stealing our loot. Going back to negotiation is a good decision:
wait({do_offer, Item}, S=#state{otheritems=OtherItems}) -> gen_fsm:reply(S#state.from, offer_changed), notice(S, "other side offering ~p", [Item]), {next_state, negotiate, S#state{otheritems=add(Item, OtherItems)}}; wait({undo_offer, Item}, S=#state{otheritems=OtherItems}) -> gen_fsm:reply(S#state.from, offer_changed), notice(S, "Other side cancelling offer of ~p", [Item]), {next_state, negotiate, S#state{otheritems=remove(Item, OtherItems)}};
Now that's something meaningful and we reply to the player with the coordinates we stored in S#state.from. The next set of messages we need to worry about are those related to with synchronising both FSMs so they can move to the
ready
state and confirm the trade. For this one we should really focus on the protocol defined earlier.
The three messages we could have are are_you_ready
(because the other user just declared himself ready), not_yet
(because we asked the other if he was ready and he was not) and ready!
(because we asked the other if he was ready and he was).
We'll start with are_you_ready
. Remember that in the protocol we said that there could be a race condition hidden there. The only thing we can do is send the ready!
message with am_ready/1
and deal with the rest later:
wait(are_you_ready, S=#state{}) -> am_ready(S#state.other), notice(S, "asked if ready, and I am. Waiting for same reply", []), {next_state, wait, S};
We'll be stuck waiting again, so it's not worth replying to our client yet. Similarly, we won't reply to the client when the other side sends a not_yet
to our invitation:
wait(not_yet, S = #state{}) -> notice(S, "Other not ready yet", []), {next_state, wait, S};
On the other hand, if the other is ready, we send an extra ready!
message to the other FSM, reply to our own user and then move to the ready
state:
wait('ready!', S=#state{}) -> am_ready(S#state.other), ack_trans(S#state.other), gen_fsm:reply(S#state.from, ok), notice(S, "other side is ready. Moving to ready state", []), {next_state, ready, S}; %% DOn't care about these! wait(Event, Data) -> unexpected(Event, wait), {next_state, wait, Data}.
You might have noticed that I've used ack_trans/1
. In fact, both FSMs should use it. Why is this? To understand this we have to start looking at what goes on in the ready!
state.

When in the ready state, both players' actions become useless (except cancelling). We won't care about new item offers. This gives us some liberty. Basically, both FSMs can freely talk to each other without worrying about the rest of the world. This lets us implement our bastardization of a two-phase commit. To begin this commit without either player acting, we'll need an event to trigger an action from the FSMs. The ack
event from ack_trans/1
is used for that. As soon as we're in the ready state, the message is treated and acted upon; the transaction can begin.
Two-phase commits require synchronous communications, though. This means we can't have both FSMs starting the transaction at once, because they'll end up deadlocked. The secret is to find a way to decide that one finite state machine should initiate the commit, while the other will sit and wait for orders from the first one.
It turns out that the engineers and computer scientists who designed Erlang were pretty smart (well, we knew that already). The pids of any process can be compared to each other and sorted. This can be done no matter when the process was spawned, whether it's still alive or not, or if it comes from another VM (we'll see more about this when we get into distributed Erlang).
Knowing that two pids can be compared and one will be greater than the other, we can write a function priority/2
that will take two pids and tell a process whether it's been elected or not:
priority(OwnPid, OtherPid) when OwnPid > OtherPid -> true; priority(OwnPid, OtherPid) when OwnPid < OtherPid -> false.
And by calling that function, we can have one process starting the commit and the other following the orders.
Here's what this gives us when included in the ready
state, after receiving the ack
message:
ready(ack, S=#state{}) -> case priority(self(), S#state.other) of true -> try notice(S, "asking for commit", []), ready_commit = ask_commit(S#state.other), notice(S, "ordering commit", []), ok = do_commit(S#state.other), notice(S, "committing...", []), commit(S), {stop, normal, S} catch Class:Reason -> %% abort! Either ready_commit or do_commit failed notice(S, "commit failed", []), {stop, {Class, Reason}, S} end; false -> {next_state, ready, S} end; ready(Event, Data) -> unexpected(Event, ready), {next_state, ready, Data}.
This big try ... catch
expression is the leading FSM deciding how the commit works. Both ask_commit/1
and do_commit/1
are synchronous. This lets the leading FSM call them freely. You can see that the other FSM just goes and wait. It will then receive the orders from the leading process. The first message should be ask_commit
. This is just to make sure both FSMs are still there; nothing wrong happened, they're both dedicated to completing the task:
ready(ask_commit, _From, S) -> notice(S, "replying to ask_commit", []), {reply, ready_commit, ready, S};
Once this is received, the leading process will ask to confirm the transaction with do_commit
. That's when we must commit our data:
ready(do_commit, _From, S) -> notice(S, "committing...", []), commit(S), {stop, normal, ok, S}; ready(Event, _From, Data) -> unexpected(Event, ready), {next_state, ready, Data}.
And once it's done, we leave. The leading FSM will receive ok
as a reply and will know to commit on its own end afterwards. This explains why we need the big try ... catch
: if the replying FSM dies or its player cancels the transaction, the synchronous calls will crash after a timeout. The commit should be aborted in this case.
Just so you know, I defined the commit function as follows:
commit(S = #state{}) -> io:format("Transaction completed for ~s. " "Items sent are:~n~p,~n received are:~n~p.~n" "This operation should have some atomic save " "in a database.~n", [S#state.name, S#state.ownitems, S#state.otheritems]).
Pretty underwhelming, eh? It's generally not possible to do a true safe commit with only two participants—a third party is usually required to judge if both players did everything right. If you were to write a true commit function, it should contact that third party on behalf of both players, and then do the safe write to a database for them or rollback the whole exchange. We won't go into such details and the current commit/1
function will be enough for the needs of this book.
We're not done yet. We have not yet covered two types of events: a player cancelling the trade and the other player's finite state machine crashing. The former can be dealt with by using the callbacks handle_event/3
and handle_sync_event/4
. Whenever the other user cancels, we'll receive an asynchronous notification:
%% The other player has sent this cancel event %% stop whatever we're doing and shut down! handle_event(cancel, _StateName, S=#state{}) -> notice(S, "received cancel event", []), {stop, other_cancelled, S}; handle_event(Event, StateName, Data) -> unexpected(Event, StateName), {next_state, StateName, Data}.
When we do it we must not forget to tell the other before quitting ourselves:
%% This cancel event comes from the client. We must warn the other %% player that we have a quitter! handle_sync_event(cancel, _From, _StateName, S = #state{}) -> notify_cancel(S#state.other), notice(S, "cancelling trade, sending cancel event", []), {stop, cancelled, ok, S}; %% Note: DO NOT reply to unexpected calls. Let the call-maker crash! handle_sync_event(Event, _From, StateName, Data) -> unexpected(Event, StateName), {next_state, StateName, Data}.
And voilà! The last event to take care of is when the other FSM goes down. Fortunately, we had set a monitor back in the idle
state. We can match on this and react accordingly:
handle_info({'DOWN', Ref, process, Pid, Reason}, _, S=#state{other=Pid, monitor=Ref}) -> notice(S, "Other side dead", []), {stop, {other_down, Reason}, S}; handle_info(Info, StateName, Data) -> unexpected(Info, StateName), {next_state, StateName, Data}.
Note that even if the cancel
or DOWN
events happen while we're in the commit, everything should be safe and nobody should get its items stolen.
Note: we used io:format/2
for most of our messages to let the FSMs communicate with their own clients. In a real world application, we might want something more flexible than that. One way to do it is to let the client send in a Pid, which will receive the notices sent to it. That process could be linked to a GUI or any other system to make the player aware of the events. The io:format/2
solution was chosen for its simplicity: we want to focus on the FSM and the asynchronous protocols, not the rest.
Only two callbacks left to cover! They're code_change/4
and terminate/3
. For now, we don't have anything to do with code_change/4
and only export it so the next version of the FSM can call it when it'll be reloaded. Our terminate function is also really short because we didn't handle real resources in this example:
code_change(_OldVsn, StateName, Data, _Extra) -> {ok, StateName, Data}. %% Transaction completed. terminate(normal, ready, S=#state{}) -> notice(S, "FSM leaving.", []); terminate(_Reason, _StateName, _StateData) -> ok.
Whew.
We can now try it. Well, trying it is a bit annoying because we need two processes to communicate to each other. To solve this, I've written the tests in the file trade_calls.erl, which can run 3 different scenarios. The first one is main_ab/0
. It will run a standard trade and output everything. The second one is main_cd/0
and will cancel the transaction halfway through. The last one is main_ef/0
and is very similar to main_ab/0
, except it contains a different race condition. The first and third tests should succeed, while the second one should fail (with a crapload of error messages, but that's how it goes). You can try it if you feel like it.
That Was Quite Something

If you've found this chapter a bit harder than the others, I must remind you that it's entirely normal. I've just gone crazy and decided to make something hard out of the generic finite-state machine behaviour. If you feel confused, ask yourself these questions: Can you understand how different events are handled depending on the state your process is in? Do you understand how you can transition from one state to the other? Do you know when to use send_event/2
and sync_send_event/2-3
as opposed to send_all_state_event/2
and sync_send_all_state_event/3
? If you answered yes to these questions, you understand what gen_fsm
is about.
The rest of it with the asynchronous protocols, delaying replies and carrying the From variable, giving a priority to processes for synchronous calls, bastardized two-phase commits and whatnot are not essential to understand. They're mostly there to show what can be done and to highlight the difficulty of writing truly concurrent software, even in a language like Erlang. Erlang doesn't excuse you from planning or thinking, and Erlang won't solve your problems for you. It'll only give you tools.
That being said, if you understood everything about these points, you can be proud of yourself (especially if you had never written concurrent software before). You are now starting to really think concurrently.
Fit for the Real World?
In a real game, there is a lot more stuff going on that could make trading even more complex. Items could be worn by the characters and damaged by enemies while they're being traded. Maybe items could be moved in and out of the inventory while being exchanged. Are the players on the same server? If not, how do you synchronise commits to different databases?
Our trade system is sane when detached from the reality of any game. Before trying to fit it in a game (if you dare), make sure everything goes right. Test it, test it, and test it again. You'll likely find that testing concurrent and parallel code is a complete pain. You'll lose hair, friends and a piece of your sanity. Even after this, you'll have to know your system is always as strong as its weakest link and thus potentially very fragile nonetheless.
Don't Drink Too Much Kool-Aid:
While the model for this trade system seems sound, subtle concurrency bugs and race conditions can often rear their ugly heads a long time after they were written, and even if they've been running for years. While my code is generally bullet proof (yeah, right), you sometimes have to face swords and knives. Beware the dormant bugs.
Fortunately, we can put all of this madness behind us. We'll next see how OTP allows you to handle various events, such as alarms and logs, with the help of the gen_event
behaviour.