How effective is OCR of BPMN elements?

Despite of a variety of available modeling tools, several analysts still prefer to create process diagrams in a hand-written manner, especially in the initial phases of process discovery activities. Modeling diagrams “by pen” may be associated with similar benefits to handwriting, when compared to typing (infographics), as also investigated in one of my older posts.

handwritten-bpmn-diagram.jpg
Handwritten BPMN diagrams still appear in initial phases of process discovery

Accordingly, at a specific time, handwritten diagrams usually need to (should) be transformed into the corresponding digital versions of diagrams (e.g. diagram.bpmn),  which offer several benefits including direct execution of business process models. However, this usually requires re-modelling of handwritten diagrams with the use of modeling tools.

BPMS Architecture
Digital versions of BPMN diagrams/models serve several purposes, including process automation (source: camunda.com)

A more efficient alternative to re-modeling would be to perform optical character recognition (OCR) in a similar way on how paper-based documents are transformed into digital ones (i.e. digitizing of process diagrams). However, since there is a lack of dedicated solutions for digitizing BPMN diagrams (i.e. optical recognition of BPMN visual vocabulary), we are investigating how effective is (may be) digitalization of BPMN elements. For this purpose, a “learning set” of handwritten BPMN elements was created by 50+ subjects, who were instructed to replicate standardized BPMN 2.0 symbols in a hand-written manner.

Vprasalnik_BPMN_3
An example of fulfilled questionnaire (handwritten symbols next to standardized ones)

Beside drawing of BPMN symbols, subjects were instructed to rewrite the latin alphabet, as well record the time of fulfilling the questionnaire. In this manner we are able to search for correlations between (1) the “quality” of handwriting and the “quality” of drawing BPMN symbols (via the effectiveness of OCR) as well (2) correlations between the speed of drawing and “quality” of drawing. To minimize learning or fatigue effects the order of elements was randomized in individual questionnaires.

Handwritten_element
An excerpt from clustered qualitative results (a single BPMN element – ‘receive message task’)

Preliminary results

While the research is still in progress, there are already some preliminary results available. The following table presents the results on how effective was OCR recognition of already analyzed BPMN elements. OCR was implemented with TensorFlow, an end-to-end open source platform for machine learning.

OCR effectiveness
Effectiveness of OCR of selected BPMN elements

As evident from the table above, the exclusive gateway (XOR), which is also one of most commonly used BPMN elements, was demonstrated as the most effectively recognized BPMN element, whereas “Manual task” (a BPMN element, which is more common in operational process diagrams) was recognized least effectively.  The following table provides some insights on how the least effectively recognized BPMN elements were actually (wrongly) recognized by OCR.

BPMN OCR alternatives
At least effectively recognized BPMN elements with wrong results (i.e. recognized elements)

As evident from the table above, the least effectively recognized BPMN elements were mainly substituted with other BPMN elements of the same type (i.e. shape). Similar OCR results were also obtained for different event types with the same trigger (e.g. intermediate message event and start message event), whereas different types of triggers were wrongly recognized in a crosswise-way (e.g. error with escalation and escalation with error).

As stated in the acknowledgements, this is work in progress, so more results are coming …

Acknowledgments

Credits to Ms. Slavica Jagečić. She is investigating the topic in her master thesis under my supervision. So more results will be available after her master thesis defense. Credits to ICT students, who voluntarily participated in the research by fulfilling the questionnaire.


 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s