Machine Learning Finds the Right Photosensitizer in 4 Rounds

A new ML framework from Nagoya University uses transfer learning to guide organic photosensitizer screening, cutting the number of experiments needed from 25 down to 4, across a broad range of photoreactions run in the EvoluChem PhotoRedOx Box.

Choosing the right organic photosensitizer for a new photoreaction is still largely a matter of empirical screening. A chemist with deep experience in photoredox catalysis can make educated guesses, but for most labs, the process involves testing a panel of candidates and hoping the best one turns up early. With 60 or more commercially available organic photosensitizers to choose from, and reaction outcomes that depend on a combination of excited-state energy, redox potentials, and substrate compatibility, systematic optimization is time-consuming and not always tractable.

A new paper in Cell Reports Physical Science from Haruki Ikemura, Naoki Noto, and colleagues at Nagoya University and collaborating institutions takes a machine learning approach to this problem. Their work, “Machine-learning-guided screening of organic photosensitizers accelerated by domain adaptation,” describes a screening framework that uses transfer learning to carry knowledge from previously characterized photoreactions into the search for the optimal photosensitizer for a new one, even when the two reactions differ substantially in type or mechanism.

The problem with standard ML-guided screening

Machine learning approaches to reaction optimization, particularly Bayesian optimization (BO), have gained considerable traction in recent years. The general strategy is to build a surrogate model that approximates the relationship between reaction conditions and outcome, then use that model to select the next candidate to test. Over several rounds, the model converges on the best conditions with fewer experiments than a random or intuition-guided screen would require.

The limitation is that most ML-based screening frameworks start from scratch for each new reaction. Every new substrate or reaction class requires building a fresh dataset, and the surrogate model has no access to accumulated knowledge from prior campaigns. This is the opposite of how an experienced chemist works, and it is a meaningful inefficiency when the cost of each experiment is non-trivial.

Transfer learning (TL) addresses this directly by allowing a model trained on one domain to improve its performance on a related but different domain. The challenge is determining what counts as “related” when photoreaction mechanisms can differ substantially, and how to weight prior data that may be partially but not entirely informative.

Domain adaptation: letting the data decide what is useful

The Ikemura and Noto group’s approach uses a specific form of transfer learning called domain adaptation (DA). The algorithm they employ, TrAdaBoost.R2 (TrAB), is an instance-based DA method that dynamically reduces the weight of uninformative data points in the source domain rather than treating all prior reaction data equally. Combined with a correlation-analysis step that automatically selects which prior photoreactions are most relevant to the new target reaction, the system extracts useful predictive signal from a diverse database without being misled by data that happens to be structurally present but chemically uninformative.

The database underlying the framework covers 60 organic photosensitizers tested across 15 photoreactions spanning a wide range of reaction types and mechanisms:

  • Nickel-catalyzed C–O, C–S, and C–N bond-forming reactions (PR1–PR6)
  • [2+2] cycloaddition via energy transfer (PR7)
  • Radical additions to 1,1-diaryl olefins (PR8–PR12)
  • Copper-catalyzed monofluoromethylation (PR13)
  • Photocatalytic phosphine cyclization via a radical-chain mechanism (PR14)
  • Three-component acylsilylation of styrene (PR15)

The 15 reactions were deliberately chosen to include both simple, well-characterized transformations and mechanistically complex ones. PR14 and PR15 in particular involve intricate multi-step radical and polar pathways that differ substantially from the nickel-catalyzed cross-couplings in PR1–PR6. Demonstrating that TL works across these boundaries was a central goal of the study.

The efficiency gain

The headline result is a substantial reduction in the number of experimental rounds needed to identify the optimal photosensitizer for a target reaction. Across 60 independent runs per method (one per possible starting point, to remove initialization bias), the three approaches compared as follows:

Standard Bayesian optimization: 24.6 rounds (mean)

Random forest optimization: 12.6 rounds (mean)

TrAdaBoost.R2 (domain-adapted TL): 3.9 rounds (mean)

Rounds required to identify the optimal organic photosensitizer across 60 independent runs. Each round tests one OPS.

The TL-based system reached the optimal photosensitizer in under 4 rounds on average, regardless of which OPS was selected to start the search. This robustness to initialization is practically important: standard BO can perform well when it happens to start near the optimum, but its performance degrades significantly with unlucky starting points. The TrAB-based system shows much smaller variance across starting conditions.

Finding what is not in the training data

One of the most demanding tests for any transfer learning system is identifying a high-performing candidate that was not present in the source domain at all. This is a realistic scenario in photochemistry, where new classes of organic photosensitizers continue to be developed and a chemist may want to evaluate a compound that has no historical performance data across existing reactions.

The authors address this directly by introducing OPS61, a photosensitizer absent from the original 60-compound database, into the screening campaign for one of the target reactions. The TL-based framework successfully identified OPS61 as a top performer, while standard BO required substantially more rounds to locate it. This result extends the practical scope of the approach: it is not limited to selecting among well-characterized compounds but can guide exploration into genuinely new chemical space.

Running the screening in the EvoluChem PhotoRedOx Box

The photocatalytic activity data underpinning the entire framework was generated experimentally. For 13 of the 15 photoreactions in the database (PR1–PR13), the reactions were carried out in the EvoluChem PhotoRedOx Box with 450 nm or 425 nm LEDs. The EvoluChem PhotoRedOx Box Duo was used for reaction PR15.

This is worth noting for a specific reason. Machine learning models are only as good as the data they are trained on. In a photocatalytic screening context, that means yield measurements across all 60 OPS–reaction combinations need to be comparable: same photon flux, same geometry, same irradiation conditions from one vial to the next. The PhotoRedOx Box’s patented chamber design distributes light evenly across all vial positions, and the EvoluChem LED ecosystem provides calibrated, narrow-band irradiation at defined wavelengths. The reliability of the dataset that made this ML framework possible is directly tied to the reproducibility of the photoreactor platform it was built on.

Implications for the field

What Ikemura, Noto, and colleagues have built is not a tool tied to a specific reaction class. The source database spans nickel photoredox, energy-transfer cycloadditions, radical polar-crossover reactions, copper catalysis, and complex multi-component sequences. The demonstration that useful predictive information can be transferred across these mechanistically distinct reactions is the conceptually important result. It suggests that a growing community database of photocatalytic activity data, generated under standardized conditions, could serve as a shared resource that makes each new photoreaction easier and faster to optimize.

The database and code are fully open-access, available at the project’s GitHub repository and archived on Zenodo. Research groups looking to apply the framework to their own photoreactions can supply their own yield data and benefit from the accumulated source domain.

For laboratories already running photoredox chemistry on standardized equipment, this paper points toward a future where the data generated by routine screening campaigns has compounding value: each new reaction screened on the PhotoRedOx Box platform adds to a dataset that makes the next screen faster.

Explore the EvoluChem PhotoRedOx Box

The standardized photoreactor platform behind 13 of the 15 reactions in this study. Interchangeable LEDs from 365 nm to 808 nm, even illumination across all vial positions, compatible with any stir plate.

Learn More

Reference: H. Ikemura, N. Noto, R. Akiba, T. Rohlfs, Y. Masuda, Y. Sumida, O. García Mancheño, T. Hosoya, H. Ohmiya, M. Sawamura, S. Saito, Cell Rep. Phys. Sci. 2026, 7, 103141. DOI: 10.1016/j.xcrp.2026.103141

Equipment used: EvoluChem PhotoRedOx Box (450 nm and 425 nm LEDs, PR1–PR13); EvoluChem PhotoRedOx Box Duo (PR15).

Data and code: github.com/Naoki-Noto/P9-20250104-HI

 

Check Our Photoreactor Suite