状态管理本身是个小东西,是否需要拿出来其实也真的没有一定之规。这里单独做成一个服务主要是希望向大家强调它确实可以做成一个独立的服务。

当我们构建一个异步的状态保存(本质上它是个备份服务)时,它应该做到这样几件事:

  • 对于它服务的撮合节点,状态备份是基本无感的,撮合仅仅提供数据导出即可,其它逻辑,从写入管理到定时到错误发现,都应该由状态节点自己封装。
  • 状态管理的写入逻辑应该足够宽容和健壮,在生产环境中,要允许错误发生,但是也要及时发现和告警。
  • 相对来说,反而状态读取逻辑的可靠性要求比较高,它虽然调用频率非常低,但是每一次调用 LoadStatus ,都意味着撮合节点在启动,应该尽可能保证这个过程的成功。

在我们这个演示项目中,除了第一条,都没有认真去做,但是这里我把它写下来,希望朋友们在实践中提起注意。

在实践中,这样的架构设计可以让撮合节点只知道柜台(counter)和其它业务节点(主要是后面会讲到的行情广播节点)的存在,不需要知道类似状态管理这种运维节点的工作。状态节点维护的是一个非常低频的操作,在实践中往往是分钟级的,它可以把一些啰嗦的规则包起来,不干扰系统的主要业务,也使得这部分运维逻辑可能的变更,不至于干扰业务节点。

这八个误区,在工程中都需要慎重考虑,认真对待。

Status 的逻辑非常的简单,所以我这里也偷懒,把主要逻辑写到了一个class里:

package liu.mars.market;

import akka.actor.*;
import akka.event.Logging;
import akka.event.LoggingAdapter;
import akka.japi.pf.ReceiveBuilder;
import clojure.lang.IFn;
import com.fasterxml.jackson.databind.ObjectMapper;
import jaskell.util.CR;
import liu.mars.market.directive.LoadStatus;
import liu.mars.market.directive.StatusQuery;
import liu.mars.market.status.DashStatus;

import java.time.Duration;

public class StatusApp extends AbstractActorWithTimers {
    private static String status_namespace = "liu.mars.market.status";
    static {
        CR.require(status_namespace);
    }
    private LoggingAdapter log = Logging.getLogger(getContext().getSystem(), this);

    private IFn save;
    private IFn load_latest;
    private ObjectMapper mapper;

    private StatusApp(){
        this.save = CR.var(status_namespace, "save").fn();
        this.load_latest = CR.var(status_namespace, "load-latest").fn();
        this.mapper = new ObjectMapper();
    }

    public static Props props() {
        return Props.create(StatusApp.class, StatusApp::new);
    }

    @Override
    public Receive createReceive() {
        return ReceiveBuilder.create()
                .match(DashStatus.class, msg -> {
                    String data = mapper.valueToTree(msg).toString();
                    save.invoke(data);
                    Home("received status from {}", getSender().toString());
                }).match(LoadStatus.class, msg -> {
                    String result;
                    if (msg.getSymbol() == null){
                        result = (String) load_latest.invoke();
                    } else {
                        result = (String) load_latest.invoke(msg.getSymbol());
                    }
                    DashStatus status = mapper.readValue(result, DashStatus.class);
                    Home("load status {} for load request from {}",
                            result, getSender().toString());
                    sender().tell(status, self());
                }).build();
    }

    @Override
    public void preStart() throws Exception {
        super.preStart();
    }

    public static void main(String[] args){
        final String config_namespace = "liu.mars.market.config";
        CR.require(config_namespace);
        ActorSystem system = ActorSystem.create("status");
        ActorRef statusActor = system.actorOf(StatusApp.props(), "status");
        LoggingAdapter log = Logging.getLogger(system, system.scheduler());
        long query_rate = (Long)CR.invoke(config_namespace, "query-rate");
        Cancellable schedule = system.scheduler().schedule(Duration.ofSeconds(query_rate),
                Duration.ofSeconds(60), () -> {
            String matcher_path = CR.invoke(config_namespace, "matcher").toString();
            ActorSelection matcher = system.actorSelection(matcher_path);
            StatusQuery query = new StatusQuery();
            query.setSymbol("btcusdt");
            matcher.tell(query, statusActor);
            Home("status query to {}", matcher_path);
        }, system.dispatcher());

        system.registerOnTermination(() -> {
            system.stop(statusActor);
            schedule.cancel();
        });


        System.out.println("Ctrl+c to stop");
    }
}

我们在 main 中启动了一个 schedule ,它会定时向撮合发送状态查询请求,收到请求的撮合节点将自己的状态返回到请求节点,写入数据库。这个过程对于撮合来说最友好,也契合响应式的编程风格。对应的数据库访问代码用 clojure 实现:

(ns liu.mars.market.status
  (:require [clojure.java.jdbc :as j])
  (:require [liu.mars.market.config :as config])
  (:require [cheshire.core :as c]))

(def db (delay @config/db))

(defn empty-status
  [sym]
  {:symbol          sym
   :asks            []
   :bids            []
   :latest-order-id 0})

(defn dump
  ([data]
   (-> data
       first
       :content
       c/generate-string))
  ([data sym]
   (-> data
       first
       :content
       (#(if (contains? % :symbol)
           %
           (assoc % :symbol sym)))
       c/generate-string)))

(defn load-latest
  ([]
   (-> @db
       (j/query ["select content from status where id=(select max(id) from status)"])
       dump))
  ([sym]
   (let [result (-> @db
                    (j/query
                      [(str "select content "
                            "from status "
                            "where id=(select max(id) from status where meta ->> 'symbol' = ?)")
                       sym]))]
     (if (empty? result)
       (c/generate-string (empty-status sym))
       (dump result sym)))))

(defn save
  ([status]
   (let [data (c/parse-string status true)
         sym (:symbol data)]
     (j/execute! @db ["insert into status(meta, content) values(?, ?)"
                      {:symbol sym}
                      data])))
  ([status source]
   (let [data (c/parse-string status true)
         sym (:symbol data)]
     (j/execute! @db ["insert into status(meta, content) values(?, ?)"
                      {:symbol sym :source source}
                      data]))))

因为这里涉及的查询实在太简单了,所以我直接写成了文本。因为这是个演示项目,其中有很多不够严密的地方,例如我们无论save还是load,都支持不提供symbol信息的调用方式,这其实仅仅是为了开发和测试方便,实践中这样的业务是不存在的。再例如查询中有依赖 meta 内部结构的查询条件,实践中这样的情况必须要建立索引,但是我们这里偷懒了。

在一个节点/逻辑启动前,我们除了准备好数据库结构,也可以一并提供初始化数据:

create table status
(
  id      serial primary key ,
  meta    jsonb     default '{}'::jsonb,
  content jsonb not null,
  save_at timestamp default now()
);
insert into status(content) values('{"latest-order-id":0, "bids":[], "asks":[], "status":"trading"}');

因为这个程序非常简单,我们的测试也没有写的很严密,例如发送了状态数据后,应该严格的等待保存工作完成再进行下一步测试,但是我们这里没有做这一步考虑,仅仅简单的让主线程 sleep 了一秒。

(ns liu.mars.market.inner-test
  (:require [clojure.test :refer :all])
  (:require [liu.mars.market.mock-actor :as mock])
  (:require [liu.mars.actor :refer [! ??]])
  (:require [clojure.java.jdbc :as j]
            [liu.mars.market.config :as config])
  (:import (akka.actor ActorSystem)
           (liu.mars ClojureActor)
           (liu.mars.market StatusApp)
           (akka.testkit.javadsl TestKit)
           (java.util.function Supplier)
           (liu.mars.market.directive StatusQuery LoadStatus)
           (liu.mars.market.status DashStatus)))

(deftest basic-test
  (let [system (ActorSystem/create "test")
        mock-actor (.actorOf system (ClojureActor/propsWithInit mock/init mock/match-mock) "btcusdt")
        status-actor (.actorOf system (StatusApp/props) "status")
        query (doto (StatusQuery.)
                (.setSymbol "btcusdt"))]
    (try
      (j/delete! @config/db :status ["id > ?" 1])
      (is (= 1 (-> @config/db
                   (j/query ["select count(*) as c from status"])
                   first
                   :c)))
      (! mock-actor query status-actor)
      (Thread/sleep 1000)
      (is (= 2 (-> @config/db
                   (j/query ["select count(*) as c from status"])
                   first
                   :c)))
      (finally
        (TestKit/shutdownActorSystem system)))))

(deftest local-test
  (let [system (ActorSystem/create "test")
        mock-actor (.actorOf system (ClojureActor/propsWithInit mock/init mock/match-mock) "btcusdt")
        status-actor (.actorOf system (StatusApp/props) "status")
        query (doto (StatusQuery.)
                (.setSymbol "btcusdt"))]
    (try
      (j/delete! @config/db :status ["id > ?" 1])
      (is (= 1 (-> @config/db
                   (j/query ["select count(*) as c from status"])
                   first
                   :c)))
      (! mock-actor query status-actor)
      (Thread/sleep 1000)
      (is (= 2 (-> @config/db
                   (j/query ["select count(*) as c from status"])
                   first
                   :c)))
      (testing "test in system inner"
        (let [load (doto (LoadStatus.)
                     (.setSymbol "btcusdt"))
              status (?? status-actor load)]
          (is (instance? DashStatus status))
          (is (= (.getLatestOrderId status)
                 (-> @config/db
                     (j/query ["select max((content ->> 'latest-order-id')::bigint) as lid from status"])
                     first
                     :lid)))))
      (testing "run test from select"
        (let [status-selection (.actorSelection system "akka://test/user/status")
              load (doto (LoadStatus.)
                     (.setSymbol "btcusdt"))
              status (?? status-selection load)]
          (is (instance? DashStatus status))
          (is (= (.getLatestOrderId status)
                 (-> @config/db
                     (j/query ["select max((content ->> 'latest-order-id')::bigint) as lid from status"])
                     first
                     :lid)))))
      (testing "run test from select"
        (let [status-selection (.actorSelection system "akka.tcp://test@127.0.0.1:2554/user/status")
              load (doto (LoadStatus.)
                     (.setSymbol "btcusdt"))
              status (?? status-selection load)]
          (is (instance? DashStatus status))
          (is (= (.getLatestOrderId status)
                 (-> @config/db
                     (j/query ["select max((content ->> 'latest-order-id')::bigint) as lid from status"])
                     first
                     :lid)))))
      (finally
        (TestKit/shutdownActorSystem system)))))

在上面的测试代码中,同时包含了直接向 actor 发送消息、通过 selection 发送消息,通过tcp通信发送消息的不同途径,所以在执行测试的时候也要准备好对应的配置文件:

akka {
  extensions = ["com.romix.akka.serialization.kryo.KryoSerializationExtension$"]

  actor {
    provider = remote

    serializers {
      java = "akka.serialization.JavaSerializer"
      kryo = "com.romix.akka.serialization.kryo.KryoSerializer"
      nippy = "liu.mars.market.NippySerializer"
    }

    serialization-bindings {
      "liu.mars.market.messages.CreateSequence" = kryo
      "liu.mars.market.messages.DropSequence" = kryo
      "liu.mars.market.messages.ListSequences" = kryo
      "liu.mars.market.messages.NextValue" = kryo
      "liu.mars.market.messages.LimitAsk" = kryo
      "liu.mars.market.messages.LimitBid" = kryo
      "liu.mars.market.messages.MarketAsk" = kryo
      "liu.mars.market.messages.MarketBid" = kryo
      "liu.mars.market.messages.Cancel" = kryo
      "liu.mars.market.messages.FindOrder" = kryo
      "liu.mars.market.messages.NextOrder" = kryo
      "liu.mars.market.messages.OrderNotFound" = kryo
      "liu.mars.market.messages.OrderNoMore" = kryo
      "com.fasterxml.jackson.databind.node.ObjectNode" = kryo
      "com.fasterxml.jackson.databind.node.ArrayNode" = kryo
      "clojure.lang.PersistentArrayMap" = nippy
      "clojure.lang.PersistentList" = nippy
      "clojure.lang.PersistentVector" = nippy
      "clojure.lang.LazySeq" = nippy
      "clojure.lang.Keyword" = nippy
      "clojure.lang.Symbol" = nippy
      "java.util.ArrayList" = nippy
    }

    kryo {
      type = "graph"

      idstrategy = "incremental"

      buffer-size = 4096
      max-buffer-size = -1
      kryo-custom-serializer-init = "liu.mars.market.KryoInit"
      kryo-trace = true

    }

  }

  remote {
    enabled-transports = ["akka.remote.netty.tcp"]
    netty.tcp {
      hostname = "127.0.0.1"
      port = 2554
    }
  }

}

并且在 project.clj 中配置 test profile 使用这个配置文件:

(defproject status-keeper "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "Eclipse Public License"
            :url "http://www.eclipse.org/legal/epl-v10.html"}
  :plugins [[lein-junit "1.1.8"]]
  :source-paths ["src/main/clojure"]
  :java-source-paths ["src/main/java"]
  :dependencies [[org.clojure/clojure "1.10.0"]
                 [com.typesafe.akka/akka-actor_2.12 "2.5.19"]
                 [com.typesafe.akka/akka-remote_2.12 "2.5.19"]
                 [liu.mars/jaskell "0.1.3"]
                 [liu.mars/akka-clojure "0.1.2"]
                 [liu.mars/market-messages "0.2"]
                 [org.postgresql/postgresql "42.2.5"]
                 [clj-postgresql "0.7.0"]
                 [org.clojure/java.jdbc "0.7.8"]
                 [com.fasterxml.jackson.core/jackson-core "2.9.6"]
                 [com.fasterxml.jackson.core/jackson-databind "2.9.6"]
                 [com.github.romix.akka/akka-kryo-serialization_2.12 "0.5.2"]]
  :test-paths ["src/test/clojure" "src/test/java"]
  :resource-paths ["resources/main"]
  :junit ["src/test/java"]
  :aot :all
  :main liu.mars.market.StatusApp
  :uberjar-merge-with {#".properties$" [slurp str spit] "reference.conf" [slurp str spit]}
  :profiles {:server  {:jvm-opts       ["-Dconfig.resource=server.conf"]
                       :resource-paths ["resources/server"]}
             :local   {:jvm-opts       ["-Dconfig.resource=server.conf"]
                       :resource-paths ["resources/local"]}
             :test    {:dependencies      [[junit/junit "4.12"]
                                           [com.typesafe.akka/akka-testkit_2.12 "2.5.19"]]
                       :resource-paths    ["resources/test"]
                       :java-source-paths ["src/test/java"]
                       :jvm-opts          ["-Dconfig.resource=test.conf"]}
             :dev     {:resource-paths ["resources/dev"]
                       :source-paths   ["src/notebook"]
                       :jvm-opts       ["-Dconfig.resource=dev.conf"]}
             :gorilla {:source-paths ["src/notebook"]
                       :plugins [[org.clojars.benfb/lein-gorilla "0.5.3"]]}})

这个项目非常的粗糙,它的作用在于把状态管理逻辑完整的隔离出来。在实践中,我们不一定会依赖关系型数据库和 JSONB ,也许我们会直接写本地文件,也许会写 s3,也许我们有更敏捷和灵活的配置机制,这一切,有待于读者自行挖掘了。