Code Generation

As hand-writing code for a lot of drivers in multiple languages would be quite a nightmare, we have invested a very large amount of time into finding a way to automate this.

So in the end we need 3 parts:

  1. Protocol definition

  2. Language template

  3. A maven plugin which generates the code

This maven plugin uses a given protocol definition as well as a language template and generates code for reading/writing data in that protocol with the given language.

code-generation-intro

The Types Base module provides all the structures the Protocol modules output which are then used in the Language templates to generate code.

Protocol Base and Language Base hereby just provide the interfaces that reference these types and provide the API for the plc4x-maven-plugin to use.

These modules are also maintained in a repository which is separate from the rest of the PLC4X code.

This is generally only due to some restrictions in the Maven build system. If you are interested in understanding the reasons - please read the chapter on Problems with Maven near the end of this page.

Concrete protocol spec parsers, code generators as well as templates that actually generate code are implemented in derived modules all located under the code-generation part of the main project repository.

We didn’t want to tie ourselves to only one way to specify protocols and to generate code. Generally multiple types of formats for specifying drivers are thinkable and the same way, multiple ways of generating code are possible. Currently, however we only have one parser: MSpec and one generator: Freemarker.

These add more layers to the hierarchy.

So for example in case of generating a Siemens S7 Driver for Java this would look like this:

code-generation-intro-s7-java

The dark blue parts are the ones released externally, the turquoise ones are part of the main PLC4X repo.

Introduction

The maven plugin is built up very modular.

So in general it is possible to add new forms of providing protocol definitions as well as language templates.

For the formats of specifying a protocol we have tried out numerous tools and frameworks, however the results were never quite satisfying.

Usually using them required a large amount of workarounds, which made the solution quite complicated. This is mainly the result, that tools like Thrift, Avro, GRPc, …​ all are made for transferring an object structure from A to B. They lay focus on keeping the structure of the object in takt and not offer ways to control the format for transferring them.

Existing industry standards, such as ASN.1 unfortunately mostly relied on large portions of text to describe part of the parsing or serializing logic, which made it pretty much useless for a fully automated code genration.

In the end only DFDL and the corresponding Apache project Apache Daffodil seemed to provide what we were looking for.

With this we were able to provide first driver versions fully specified in XML.

The downside was, that the PLC4X community regarded this XML format as pretty complicated and when implementing an experimental code generator we quickly noticed that generating a nice object model would not be possible, due to the lack of an ability to model inheritance of types into a DFDL schema.

In the end we came up with our own format which we called MSpec and is described in the MSpec Format description.

Configuration

The plc4x-maven-plugin has a very limited set of configuration options.

In general all you need to specify, is the protocolName and the languageName.

An additional option outputFlavor allows generating multiple versions of a driver for a given language. This can come in handy if we want to be able to generate read-only or passive mode driver variants.

In order to be able to refactor and improve protocol specifications without having to update all drivers for a given protocol, we recently added a protocolVersion attribute, that allows us to provide and use multiple versions of one protocol. So in case of us updating the fictional wombat-protocol, we could add a version 2 mspec for that, then use the version 2 in the java-driver and continue to use version 1 in all other languages. Once all drivers are updated we could eliminate the version again.

Last, not least, we have a pretty generic options config option, which is a Map type.

With options is it possible to pass generic options to the code-generation. So if a driver or language requires further customization, these options can be used. For a list of all supported options for a given language template, please refer to the corresponding language page.

Currently, the Java module makes use of such an option for specifying the Java package the generated code uses. If no package option is provided, the default package org.apache.plc4x.{language-name}.{protocol-name}.{output-flavor} is used, but especially when generating custom drivers, which are not part of the Apache PLC4X project, different package names are better suited. So in these cases, the user can simply override the default package name.

There is also an additional parameter: outputDir, which defaults to ${project.build.directory}/generated-sources/plc4x/ and usually shouldn’t require being changed in case of a Java project, but usually requires tweaking when generating code for other languages.

Here’s an example of a driver pom for building a S7 driver for java:

<?xml version="1.0" encoding="UTF-8"?>
<!--
  Licensed to the Apache Software Foundation (ASF) under one
  or more contributor license agreements.  See the NOTICE file
  distributed with this work for additional information
  regarding copyright ownership.  The ASF licenses this file
  to you under the Apache License, Version 2.0 (the
  "License"); you may not use this file except in compliance
  with the License.  You may obtain a copy of the License at

      https://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing,
  software distributed under the License is distributed on an
  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  KIND, either express or implied.  See the License for the
  specific language governing permissions and limitations
  under the License.
  -->
<project xmlns="http://maven.apache.org/POM/4.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <parent>
    <groupId>org.apache.plc4x.plugins</groupId>
    <artifactId>plc4x-code-generation</artifactId>
    <version>0.13.0-SNAPSHOT</version>
  </parent>

  <artifactId>test-java-s7-driver</artifactId>

  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.plc4x.plugins</groupId>
        <artifactId>plc4x-maven-plugin</artifactId>
        <executions>
          <execution>
            <id>test</id>
            <phase>generate-sources</phase>
            <goals>
              <goal>generate-driver</goal>
            </goals>
            <configuration>
              <protocolName>s7</protocolName>
              <languageName>java</languageName>
              <outputFlavor>read-write</outputFlavor>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>

  <dependencies>
    <dependency>
      <groupId>org.apache.plc4x.plugins</groupId>
      <artifactId>plc4x-code-generation-driver-base-java</artifactId>
      <version>0.13.0-SNAPSHOT</version>
    </dependency>

    <dependency>
      <groupId>org.apache.plc4x.plugins</groupId>
      <artifactId>plc4x-code-generation-language-java</artifactId>
      <version>0.13.0-SNAPSHOT</version>
      <!-- Scope is 'provided' as this way it's not shipped with the driver -->
      <scope>provided</scope>
    </dependency>

    <dependency>
      <groupId>org.apache.plc4x.plugins</groupId>
      <artifactId>plc4x-code-generation-protocol-s7</artifactId>
      <version>0.13.0-SNAPSHOT</version>
      <!-- Scope is 'provided' as this way it's not shipped with the driver -->
      <scope>provided</scope>
    </dependency>
  </dependencies>

</project>

So the plugin configuration is pretty straight forward, all that is specified, is the protocolName, languageName and the output-flavor.

The dependency:

    <dependency>
      <groupId>org.apache.plc4x.plugins</groupId>
      <artifactId>plc4x-code-generation-driver-base-java</artifactId>
      <version>0.13.0-SNAPSHOT</version>
    </dependency>

For example contains all classes the generated code relies on.

The definitions of both the s7 protocol and java language are provided by the two dependencies:

    <dependency>
      <groupId>org.apache.plc4x.plugins</groupId>
      <artifactId>plc4x-code-generation-language-java</artifactId>
      <version>0.13.0-SNAPSHOT</version>
      <!-- Scope is 'provided' as this way it's not shipped with the driver -->
      <scope>provided</scope>
    </dependency>

and:

    <dependency>
      <groupId>org.apache.plc4x.plugins</groupId>
      <artifactId>plc4x-code-generation-protocol-s7</artifactId>
      <version>0.13.0-SNAPSHOT</version>
      <!-- Scope is 'provided' as this way it's not shipped with the driver -->
      <scope>provided</scope>
    </dependency>

The reason for why the dependencies are added as code-dependencies and why the scope is set the way it is, is described in the Why are the protocol and language dependencies done so strangely? section.

Custom Modules

The plugin uses the Java Serviceloader mechanism to find modules.

Protocol Modules

In order to provide a new protocol module, all that is required, it so create a module containing a META-INF/services/org.apache.plc4x.plugins.codegenerator.protocol.Protocol file referencing an implementation of the org.apache.plc4x.plugins.codegenerator.protocol.Protocol interface.

This interface is located in the org.apache.plc4x.plugins:plc4x-code-generation-protocol-base module and generally only defines three methods:

package org.apache.plc4x.plugins.codegenerator.protocol;

import org.apache.plc4x.plugins.codegenerator.types.exceptions.GenerationException;

import java.util.Optional;

public interface Protocol {

    /**
     * The name of the protocol what the plugin will use to select the correct protocol module.
     *
     * @return the name of the protocol.
     */
    String getName();

    /**
     * Returns a map of type definitions for which code has to be generated.
     *
     * @return the Map of types that need to be generated.
     * @throws GenerationException if anything goes wrong parsing.
     */
    TypeContext getTypeContext() throws GenerationException;


    /**
     * @return the protocolVersion is applicable
     */
    default Optional<String> getVersion() {
        return Optional.empty();
    }

}

The name is being used for the module to find the right language module, so the result of getName() needs to match the value provided in the maven config-option protocolName.

As mentioned before, we support multiple versions of a protocol, so if getVersions() returns a non-empty version, this is used to select the version.

The most important method for the actual code-generation however is the getTypeContext() method, which returns a TypeContext type which generally contains a list of all parsed types for this given protocol.

Language Modules

Analog to the Protocol Modules the Language modules are constructed very similar.

The LanguageOutput interface is very simplistic too and is located in the org.apache.plc4x.plugins:plc4x-code-generation-language-base module and generally only defines four methods:

package org.apache.plc4x.plugins.codegenerator.language;

import org.apache.plc4x.plugins.codegenerator.types.definitions.ComplexTypeDefinition;
import org.apache.plc4x.plugins.codegenerator.types.exceptions.GenerationException;

import java.io.File;
import java.util.Map;

public interface LanguageOutput {

    /**
     * The name of the template is what the plugin will use to select the correct language module.
     *
     * @return the name of the template.
     */
    String getName();

    List<String> supportedOutputFlavors();

    /**
     * An additional method which allows generator to have a hint which options are supported by it.
     * This method might be used to improve user experience and warn, if set options are ones generator does not support.
     *
     * @return Set containing names of options this language output can accept.
     */
    Set<String> supportedOptions();

    void generate(File outputDir, String version, String languageName, String protocolName, String outputFlavor,
        Map<String, TypeDefinition> types, Map<String, String> options) throws GenerationException;

}

The file for registering Language modules is located at: META-INF/services/org.apache.plc4x.plugins.codegenerator.language.LanguageOutput

The name being used by the plugin to find the language output module defined by the maven config option languageName.

supportedOutputFlavors provides a possible list of flavors, that can be referred to by the maven config option outputFlavor.

supportedOptions provides a list of options that the current language module is able to use and which can be passed in to the maven configuration using the options settings.

Problems with Maven

Why are the 4 modules released separately?

We mentioned in the introduction, that the first 4 modules are maintained and released from outside the main PLC4X repository.

This is due to some restrictions in Maven, which result from the way Maven generally works.

The main problem is that when starting a build, in the validate-phase, Maven goes through the configuration, downloads the plugins and configures these. This means that Maven also tries to download the dependencies of the plugins too.

In case of using a Maven plugin in a project which also builds the maven plugin itself, this is guaranteed to fail - Especially during releases. While during normal development, Maven will probably just download the latest SNAPSHOT from our Maven repository and will be happy with this and not complain even if this version will be overwritten later on in the build. It will just use the new version as soon as it has to.

During releases however the release plugin changes the version to a release version and then spawns a build. In this case the build will fail because there is no Plugin with that version to download from anywhere. In this case the only option would be to manually build and deploy the plugin in the release version and to re-start the release (Which is not a nice thing for the release manager).

For this reason we have stripped down the plugin and its dependencies to an absolute minimum and have released that separately from the rest, hoping due to the minimality of the dependencies that we will not have to do it very often.

As soon as the tooling is released, the version is updated in the PLC4X build and the release version is used without any complications.

Why are the protocol and language dependencies done so strangely?

It would certainly be a lot cleaner, if we provided the dependencies to protocol and language modules as plugin dependencies.

However, as we mentioned in the previous subchapter, Maven tries to download and configure the plugins prior to running the build. So during a release the new versions of the modules wouldn’t exist, this would cause the build to fail.

We could release the protocol- and the language modules separately too, but we want the language and protocol modules to be part of the project, to not over-complicate things - especially during a release.

In order to keep the build and the release as simple as possible, we built the Maven plugin in a way, that it uses the modules dependencies and creates its own Classloader to contain all of these modules at runtime.

This brings the benefit of being able to utilize Maven’s capability of determining the build order and dynamically creating the modules build classpath.

Adding a normal dependency however would make Maven deploy the artifacts with the rest of the modules.

We don’t want that as both the protocol as well as the language-modules are useless as soon as they have been used to generate the code.

So we use a trick that is usually used in Web applications, for example: Here the vendor of a Servlet engine is expected to provide an implementation of the Servlet API. It is forbidden for an application to bring this along, but it is required to build the application.

For this the Maven scope provided, which tells Maven to provide it during the build, but to exclude it from any applications it builds, because it will be provided by the system running the application.

This is not quite true, but it does the trick.