<div dir="ltr"><div><b>End-User Programming of Virtual Assistants Skills with Stylish Graphical User Interfaces</b><br></div>Michael Fischer<br><div>Advised by Professor Monica Lam</div><div>Computer Science Department<br></div><div><br></div><div>Oral Exam</div><div></div><div></div><div>Public Session: Noon – 1:00PM PST<br></div><div>Thursday, June 11th 2020</div><div>Location: <a href="https://stanford.zoom.us/j/95233558336?pwd=eExOa1FOMWVqWXkxNm9Ud0hSOFhPUT09" target="_blank">https://stanford.zoom.us/j/95233558336?pwd=eExOa1FOMWVqWXkxNm9Ud0hSOFhPUT09<br></a><b><br></b></div><div><b>Summary</b><br>Virtual assistants give end-users the capability to access their devices and web data using hundreds of thousands of predefined skills. Nonetheless, there is still a long-tail of personal digital tasks that individuals wish to automate.  This thesis explores how end-users can define useful personalized skills with designer-like interfaces, all without learning any formal programming languages.</div><div><br></div><div>Our system enables end-users to develop virtual assistant skills in their web-browser by capturing what they say, type, and click on.  This system is the first program-by-demonstration system that produces programs with control constructs.  The system gives the user an easy-to-learn multimodal interface and generates code in a formal programming language which supports parameterization, function invocation, conditionals, and iterative execution.</div><div><br></div><div>We show that a virtual assistant skill can greatly benefit from having a graphical interface as users can monitor multiple queries simultaneously, re-run skills easily, and adjust settings using multiple modes of interaction.  We developed a system that automatically translates a user’s voice command into a reusable skill with a graphical user interface.  Unlike the formulaic interfaces generated by prior state of the art work, we generate interfaces that are visually interesting and diverse by using a novel template-based approach.</div><div><br></div><div>To improve the aesthetics of graphical user interfaces we use a technique called style transfer, a method for applying the style of one image to another.  We show that the previous formulation of style transfer cannot retain structure in an image, which causes the output result to lack definition and legibility and renders restyled interfaces not usable. Our purely neural-network based solution captures structure by the uncentered cross-covariance between features across different layers of a convolutional neural network. By minimizing the loss between the style and output images, our technology retains structure while generating results with texture in the background, shadow and contrast at the borders, and consistency of design across edges.</div><div><br></div><div>In summary, our system enables end-users to create web-based skills with designer-like automatically generated graphical user interfaces.  </div><div><div class="gmail-yj6qo"></div><div class="gmail-adL"><br></div></div><div class="gmail-adL"><div><img src="cid:ii_kb9xfz1x0" alt="ad.png" width="489" height="562"><br></div><div>image from <a href="https://mrs.stanford.edu/art-science-2020-exhibition#videos">United States vs. Alexa</a> project</div></div></div>