This tutorial demonstrates both the right ways and wrong ways of benchmarking user Lua code in OpenResty.
1 | cd ~ |
First of all, make sure our CPU is always at its full speed.
1 | echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor |
It usually takes the value powersave
by default and we need to set it toperformance
instead.
The simplest way to time some Lua code is to use the time
command with the resty
command.
1 | time resty -e 'ngx.re.find("hello, world.", [[\w+\.]], "jo")' |
But there is a catch. The resty
command itself has a startup and exiting overhead.
1 | time resty -e '' |
We can see there’s overhead of about 11 milliseconds on this machine.
Instead, we should use the ngx.now
Lua API function provided by OpenResty.
1 | restydoc -s ngx.now |
Let’s put our Lua code into a file named ./bench.lua
for better readablility.
We make the following edits:
- First of all, we make sure the cached time inside nginx is up to date.
- And then we record the begin time which has millisecond precision.
- And then put our aforementioned regex matching call.
- And then we update our cached time again.
- Finally, output the elapsed time by doing a time subtraction.
- Let’s save the file.
1 | ngx.update_time() |
Then run the resty
shell command.
1 | resty bench.lua |
It records about a latency of about 1 millisecond. But we will soon see it is very inaccurate.
The correct way is to make the following edits in the bench.lua
file:
- Put the call into a Lua function named
target
. - And then call this function first for 100 times as a warmup. Now this
target
function should be JIT compiled after this loop is executed. - And then inside the timed code region, we call it repeatedly for 10 million times.
- Finally we compute the average time.
1 | local function target() |
We now run this script again.
1 | resty bench.lua |
We can see that it is merely about 30 nanoseconds per call. So many many times faster than the previous result!
Actualy we can further make sure no dead GC objects hanging around before we time the code. Just insert the following line of code before the first ngx.update_time()
call.
1 | collectgarbage() |
Here we force a full GC cycle before recording the begin time.
It does not help much with our example here, however. This is because our timed code does not create many GC objects anyway.
We can make the target
function faster by avoiding unnecessary Lua table lookup operations.
1 | local re_find = ngx.re.find |
But the difference may not be measurable here.
This is what I’d cover today. Hopefully you find it interesting.
If you like this tutorial, please subscribe to this blog site and our YouTube channel. Thank you!
About This Article and Associated VideoThis article and its associated video are both generated automatically from a simple screenplay file.
Boost your application's performance by OpenResty XRay
About The AuthorYichun Zhang is the creator of the OpenResty® open source project. He is also the founder and CEO of the OpenResty Inc. company. He contributed a dozen open source Nginx 3rd-party modules, quite some Nginx and LuaJITcore patches, and designed the OpenResty XRay platform.
TranslationsWe provide the Chinese translation for this article onblog.openresty.com.cn. We also welcome interested readers to contribute translations in other natural languages as long as the full article is translated without any omissions. We thank them in advance.
We are hiringWe always welcome talented and enthusiastic engineers to join our team atOpenResty Inc. to explore various open source software’s internals and build powerful analyzers and visualizers for real world applications built atop the open source software. If you are interested, please send your resume totalents@openresty.com
. Thank you!