I needed to make an histogram of a list of timestamp to better understand the load on a server, so I decide to see if I could use Arrow and matplotlib together.
Arrow is the requests of time for Python. In other words, it is a wonderful Python module that allow the user to manipulate time and timestamp very intuitively, much like requests
makes dealing with HTTP/HTTPS requests easy. matplotlib is a well-know and powerful python module for creating beautiful plots, it is very powerful, but its API shoes some age, since it is quite convoluted.
Overall, creating such an histogram is simple and I wanted to highlight a couple of pitfalls and nice things that I have encountered during the process.
You can find the full script in this gist, it expects a file with a list of timestamps (in UTC format, but should work with other formats too) and produces an histogram with their distribution. You can find my input data on this pastebin.
Pitfalls
1. make matplotlib
understand dates
matplotlib
manages dates internally in a format that I don really understand, but their API provides a date2num
function that translates datetime
objects in the matplotlib’s date
objects.
To get a datetime
object from an Arrow
object just do: atime.datetime
.
1 2 3 4 |
def convert_time(atime): # Convert datetime objects to Matplotlib dates. # https://matplotlib.org/api/dates_api.html return mdates.date2num(atime.datetime) |
2. print legible dates as labels for the x-axis
This does not need much comment, besides the fact that it took me quite a while in combination with the problem above to figure out why I was getting all sorts of weird numbers.
1 2 3 4 5 6 7 8 9 |
# Choose your xtick format string years_fmt = mdates.DateFormatter('%Y-%m-%d %H:00') # format the ticks ax.xaxis.set_major_formatter(years_fmt) [...] # auto-rotate date labels on the x-axis fig.autofmt_xdate() |
Nice moments
1. time spans
To create the histogram I wanted to take as minimum an maximum respectively:
- the beginning (
XX:00:00
) of one hour before of the smallest timestamp. So, if the minimum timestamp was2017-12-05T17:55:37.806460+01:00
, I wanted2017-12-05T16:00:00.000000+01:00
; - the end (
XX:59:59
) of one hour after of the greatest timestamp. So, if the minimum timestamp was2017-12-13T19:54:37.361527+01:00
, I wanted2017-12-13T20:59:59.000000+01:00
;
With Arrow all of this is quite straightforward.
1 2 |
start = min(timestamps).replace(hours=-1).span('hour')[0] end = max(timestamps).replace(hours=+1).span('hour')[1] |
What the code above does:
min(timestamps)
(max(timestamps)
) finds the minimum (respectively, maximum) timestamp within the listtimestamps
;.replace(hours=-1)
(.replace(hours=-1)
) returns a new timestamp with 1 hour subtracted (respectively, added) to the given timestamp;.span('hour')[0]
(.span('hour')[1]
) returns a tuple of length with the beginning and the end of the hour containing the given timestamp (you can use other spans like'year'
and'minute'
.
2. time ranges
Loop over time ranges with Arrow is very simple, the following list comprehension that creates a list of timestamps between start
and end
with spacing of 30 minutes.
1 2 |
bins = [convert_time(abin) for abin in arrow.Arrow.range('minute', start, end)][1::30] |
3. personalized x-ticks with time
Timestamps are handled by matplotlib as numbers (seconds from epoch), so we need to specify to the plot that we want to print those numbers is a date format.
The following code specifies a DateFormatter
to the x-axis so that the timestamps are shown in the format '%Y-%m-%d %H:00'
and then it creates a list of timestamps from start
to end
spaced by 6 hours and which are set as ticks for the axis.
1 2 3 4 5 6 7 8 |
# Choose your xtick format string years_fmt = mdates.DateFormatter('%Y-%m-%d %H:00') # format the ticks ax.xaxis.set_major_formatter(years_fmt) xticks = [convert_time(abin) for abin in arrow.Arrow.range('hour', start, end)][1::6] ax.set_xticks(xticks) |
The final result
And hereś the final result. Enjoy!
The image at the top is by Mrs Airwolfhound via Flickr, released under a CreativeCommons Attribution-NonCommercial-NoDerivs 2.0 Generic (CC BY-NC-ND 2.0) license.