Teaching an AI to invent new Chinese characters

For associated code, please see the github repo. Huge shoutout to my old friend Daniel Tse, linguist and ML expert extraordinaire for invaluable help and ideas on both fronts throughout this campaign. ...

May 26, 2024 · 21 min · 4435 words · Yue Wu

Optical flow timelapse stabiliser

For associated code, please see the Jupyter notebook in the github repository While machine learning has been very successful in image-processing applications, there’s still a place for traditional computer vision techniques. One of my hobbies is making timelapse videos by taking photos every 10 minutes for many days, weeks or even months. Over this time scale, environmental effects such as thermal expansion from the day-night cycle can introduce period offsets into the footage. I have a backlog of timelapse footage that is unwatchable due to excessive shifts of this nature. In the past I’ve used the Blender VFX stabilization tool to stabilize on a feature, but this breaks in e.g. day-night cycles or subjects moving in front of the feature. I finally got round to writing my code to do this that doesn’t rely on specific feature tracking. ...

February 14, 2024 · 4 min · 737 words · Yue Wu

Chemical Diffusion

Generative text-to-image models have recent become very popular. Having a bunch of fully captioned images left over from the This JACS Does Not Exist project, I’ve trained a Stable Diffusion checkpoint on that dataset (~60K JACS table-of-contents images with matched paper titles). It seems to work pretty well. Here are some examples of prompts and the resulting images generated: “Development of a Highly Efficient and Selective Catalytic Enantioselective Hydrogenation for Organic Synthesis” ...

February 15, 2023 · 2 min · 341 words · Yue Wu

Training T5 models and generating text

For the language-to-language components of this JACS does not exist, I chose Google’s T5 (text-to-text transfer transformer) as a recent cutting-edge text sequence to sequence model. I had already scraped all the JACS titles and abstracts, so training data was readily available. The first task was to generate somewhat-convincing abstracts from titles to increase the entertainment value of TJDNE. As abstracts have a maximum length, I wanted to make sure that a whole abstract would be included in T5’s maximum input length of 512 tokens so that end-of-sequence locations could be determined. Here’s a histogram of the length distribution of the abstracts tokenized with the T5 tokenizer. ...

August 25, 2022 · 3 min · 465 words · Yue Wu

Abstract2title

Many people enjoyed title2abstract from the this JACS does not exist project so I inverted the training parameters for a quick follow up. Presenting abstract2title: You can also test the title2abstract and toc2title apps in my previous post.

August 22, 2022 · 1 min · 38 words · Yue Wu

This JACS does not exist

An imaginary abstract generated at thisJACSdoesnotexist.com In academic chemistry, authors submit a promotional table-of-contents (ToC) image when publishing research papers in most journals. These fascinate me as they are one of the few places where unfettered self expression is tolerated, if not condoned. (See e.g. TOC ROFL, of which I am a multiple inductee) In general, though, ToC images follow a fairly consistent visual language, with distinct conventions followed in different subfields. This presents an vaguely plausible excuse to train some machine learning models to generate ToCs and their accompanying paraphenalia. In this project, I use ToC images, titles and abstracts from one of the most longest running and well-known chemistry journals, the Journal of the American Chemical Society (JACS) as a dataset to train: ...

August 1, 2022 · 10 min · 1939 words · Yue Wu