TOML
In the following Jupyter notebook, I will provide a quick introduction to TOML, and why I like to use it to make log and configuration files in my workflows.
The following work is based on my opinion. Credits of all the software used and demonstrated belongs to their respective authors and the community. The links of other tutorials provided are credited to their respective authors.
License : MIT
Links:
Other links:
- In-detail tutorial by Real Python
- Quick tutorial by LinuxHint (*warning: last updated over 2 years ago!*)
- Skeptic review "What is wrong with TOML?" by HitchDev (*warning: publishing date not mentioned, read at your own risk.*)
- Python package I will use in this notebook by GitHub user uiri. (*warning: despite being one of the most popular TOML packages for Python, purely based on counting GitHub stars, it doesn't use the latest version of TOML. Check the official TOML wiki to download the latest Python implementation.*)
TOML is short for "Tom's Obvious Minimal Language". It was developed by GitHub's co-founder Tom Preston-Werner almost 9 years ago. However, in recent years it has gained popularity. It is evident by there are numerous implementations in numerous langagues.
I am not going to dive in comparing different config file formats such INI, JSON, YAML, etc., because there are a lot of blogs, reddit threads, etc. doing an amazing job. Here's an example by Don Parakin.
Moving on to how to use it!
#imports
import numpy as np
import toml
model_name = "Best_model_ever"
balanced_accuracy_cv = [0.83, 0.86, 0.81, 0.81, 0.83]
kappa_cv = [0.83, 0.86, 0.81, 0.81, 0.83]
folds_cv = 5
mean_kappa_cv = 0.65
mean_balanced_accuracy_cv = 0.83
balanced_accuracy_test = 0.83
cohen_kappa_test = 0.66
confusion_matrix = [[129, 20], [23, 85]]
toml_string = f"""
#Model logs in TOML-formatted data
Model_Name = "{model_name}"
Numbers_of_Folds = {folds_cv}
[Balanced_Accuracy_CV_Metrics]
Balanced_Accuracy_of_all_Folds = {balanced_accuracy_cv}
Mean_Balanced_Accuracy_CV = {mean_balanced_accuracy_cv}
Standard_Deviation_Mean_Balanced_Accuracy_CV = {np.round_(np.std(balanced_accuracy_cv), decimals = 2)}
[Cohens_Kappa_CV_Metrics]
Cohens_Kappa_of_all_Folds = {kappa_cv}
Mean_Cohens_Kappa_CV = {mean_kappa_cv}
Standard_Deviation_Mean_Cohens_Kappa_CV = {np.round_(np.std(kappa_cv), decimals= 2)}
[Test_Set_Metrics]
Balanced_Accuracy_Test_Set = {balanced_accuracy_test}
Cohens_Kappa_Test_Set = {cohen_kappa_test}
Confusion_Matrix_Test_Set = {confusion_matrix}
"""
print(toml_string)
#Save it in a similar way a text file is saved, but obviously with toml extension
with open('test_toml.toml', 'w') as f:
f.write(toml_string)
Visual inspection of the TOML file
#To load the TOML file now as a dictionary
model_log = toml.load('test_toml.toml')
model_log
We obviously lose the comment when loading this way, but it is because commments are not parsed in Python dicts.
print(model_log['Balanced_Accuracy_CV_Metrics'])
print(type(model_log['Balanced_Accuracy_CV_Metrics']['Balanced_Accuracy_of_all_Folds']))
print(type(model_log['Balanced_Accuracy_CV_Metrics']['Balanced_Accuracy_of_all_Folds'][0]))
#It is also possible to convert dictionary to TOML formatted string
model_toml_string = toml.dumps(model_log)
print(model_toml_string)
#The last comma will disappear re-reading it
#Speaking of re-reading, it is also possible to directly save the dictionary as the TOML file
with open('test_toml_2.toml', 'w') as f:
toml.dump(model_log, f)
Visual inspection of the TOML file.
#Or to simply convert a TOML string to a dictionary
model_toml_string_direct = toml.loads(toml_string)
model_toml_string_direct
End words:
Hope you feel motivated to prepare config/logs of your work/model. Not only it is a good practice as it enhances reproduction of your work when you look back in the past or submit to a journal and others can do so, but also it is incredibly convenient to do so.