Scribbling Interfaces

About

Inspired by Airbnb's Sketching Interfaces and with a personal interest in design tools, I seized Umeå Institute of Design's 2-week Machine Learning course to explore how machine learning could enhance design tools and workflows.

The resultis a Figma plugin prototype that turns hand-drawn low-fidelity wireframes into higher-fidelity mockups . Instead of generating arbitrary and opinionated visual elements, the plugin links the output to an existing design system (Figma components), allowing for quick iterations with a coherent design language.

For the prototype, I've trained a object-recognition model using YOLOv5 and implemented it using TensorflowJS. The plugin is currently not published, as I haven't found a good way to host or ship the trained model with the plugin.

Approach

In order to train a custom object-recognition model , I've created a dataset of 500 hand-drawn wireframes, containing a variation of up to three different elements (images, buttons and text placeholders).¹

The wireframes were drawn both digitally and analog

With only one person contributing to the dataset, the model was very tailored to my brush strokes and turned out to be less accurate if someone else was sketching. I've tried to compensate for this by augmenting² the wireframes, netting me around 1500 images and increasing the overall robustness a little bit.

Putting things together

After training the model for several generations³ and running a few predicitons on a local machine, I've wrapped the model in a small web app so friends and colleagues could interact with the model.

I've used a napkin as analogy and expression of turning rough ideas into higher fidelities.

All in all, this project was quite a 'learn-as-you-go' experience and was only made possible by excellent mentors and resources⁴ by the ML community.

How does it work?

When infering a YOLO object-recognition model, the model is 1.) fed with a source image, 2.) the model then does some inference magic and 3.) eventually returns the image along with bounding boxes of every predicted element.

A high-level overview of the original input, the segmented output with its bounding boxes and the synthesized mockups.

These bounding boxes provide us with the width, height and x/y position of every detected elements — or in other words, a blueprint we can use to re-draw the image. All we have to do is substitute each bounding box with its respective detected element et voilà, we magically increased the fidelity of our wireframe.

Visualizing the bounding boxes predicted by the ML model.

In the initial napkin version, the translation of scribbles to wireframes was very naïve because it translates the bounding boxes 1:1, without paying attention to ex. alignment or layout. While this can be improved by aligning items in near proximity (as seen in the Figma plugin), there are certainly more sophisticated solutions which ex. could utilize Figma's built-in layout features. Certainly something to work on for a next version.

Context

This project was part and deliverable of Umeå Institute of Design's Experience Prototyping course and 2-week Machine Learning module, led and mentored by the excellent Jen Skyes and Andreas Refsgaard.

For the sake of keeping this page concise, I'm not diving into technical details or rationales. YOLOv4 was choosen due to its relative ease of use and because similar work used it and vouched for it, see [4].Return ↑
In this context, augmentation refers to creating alterations of the source image, by ex. skewing and distorting it, in order to increase the size of the dataset.Return ↑
A generation, or more accurately epoch, refers to one training cycle of a machine learning algorithm. Put simply, more epochs increase the precision of the algorithm because the algorithm has more time to learn.Return ↑
i.a.Live Web Prototypes from Hand-Drawn Mockups (2019)Return ↑