Scraper
While the Apache PLC4X API allows simple access to PLC resources, if you want to continuously monitor some values and have them retrieved in a pre-defined interval, the core PLC4X API method is a little bit uncomfortable.
Especially when you have multiple batches of data you want to have refreshed in different intervals.
In this case you need to take care of the scheduling of queries, need to manage the connection state (Check if the connection is still available and to apply countermeasures, if there are problems)
As we have encountered exactly the same problem for about every integration module we created, the Apache PLC4X team has created a tool called the Scraper
.
This tool automatically handles all of the tasks mentioned above.
Getting started with the Scraper
The Scraper can be found in the Maven module:
<dependency> <groupId>org.apache.plc4x</groupId> <artifactId>plc4j-scraper</artifactId> <version>0.13.0-SNAPSHOT</version> </dependency>
In general, you need 3 parts to work with the Scraper
:
1) A Scraper
Configuration
2) A Scraper
Implementation
3) A Handler to handle the results of Scraper
jobs
In the Scraper
Configuration you define the so-called jobs
.
Sources
Sources define connections to PLCs using PLC4X drivers.
Generally you can think of a Source
as a PLC4X connection string, given an alias name.
Jobs
A Job
defines which resources (PLC Addresses) should be collected from which Sources
with a given Trigger
.
All resources in a job will be collected as a batch.
Generally multiple types of triggers could theoretically be supported, but for now only a time triggered job (Aka SCHEDULED
) is actually supported.
In the near future we’re hoping that we will be able to support: - External triggers - Triggering collection based upon PLC-values
But, as to now, this has not been implemented yet.
Configuration using the Java API
The core of the Scraper configuration is the ScraperConfigurationTriggeredImplBuilder
class.
Use this to build the configuration objects used to bootstrap the Scraper.
ScraperConfigurationTriggeredImplBuilder builder = new ScraperConfigurationTriggeredImplBuilder();
As soon as you have your builder
instance, you should add at least one source
to it.
builder.addSource({connectionName}, {plc4xConnectionString});
The connectionName
will be what we use when configuring the job to reference which source it should use to collect.
In order to configure a job
we have to get an instance of a JobConfigurationTriggeredImplBuilder
.
JobConfigurationTriggeredImplBuilder jobBuilder = builder.job({jobName}, {triggerCommand});
This creates a new job
with a given name which is executed based on the information in the triggerCommand
.
As mentioned above, we currently only support a time-scheduled collection.
This generally requires just one parameter: The number of milliseconds
between each collection.
(SCHEDULED,1000)
Above would schedule a collection every 1000ms - so once every second.
Up to now this job would not be run anywhere, and it would also not collect anything.
So in order to have the job actually do something, we should assign it a source
to collect from.
jobBuilder.source({connectionName});
Here we could theoretically collect on multiple sources, by simply calling the source()
method multiple times.
All sources would be collected at the same time, whenever the trigger tells it to.
So the last thing we need to configure our first Scraper
job, is to add a few fields for it to collect.
jobBuilder.field({fieldName}, {fieldAddress});
The field
method has to be called for every field we want to add to the current job configuration.
It gives a PLC4X address string an easy to understand string name, just like when using the core PLC4X API.
As soon as we’re done adding fields, we configure the job by calling the build
method.
jobBuilder.build();
This configures the finished job and attaches that to the overall Scraper
configuration of the scraper configuration.
As soon as we’re done configuring jobs, we need to create the Scraper
configuration by calling the build
method on the builder
:
ScraperConfigurationTriggeredImpl scraperConfig = builder.build();
Running the Scraper
In order to run the Scraper
, the following boilerplate code is needed.
try { PlcDriverManager plcDriverManager = new PooledPlcDriverManager(); TriggerCollector triggerCollector = new TriggerCollectorImpl(plcDriverManager); TriggeredScraperImpl scraper = new TriggeredScraperImpl(scraperConfig, (jobName, sourceName, results) -> { ... }, triggerCollector); scraper.start(); triggerCollector.start(); } catch (ScraperException e) { log.error("Error starting the scraper", e); }
At first a new PooledPlcDriverManager
is created (It actually doesn’t have to be the pooled version, but we strongly suggest you use it as for some protocols the connection process is stressfull for the connected PLC).
With this plcDriverManager
we can then create a so-called TriggerCollector
, which we pass in the driver manager as argument.
Next comes the probably most important part: We configure the scraper, by binding a Scraper Configuration
, a ResultHandler
and a TriggerCollector
together.
After this, the scraper is ready to start, which is then done by calling start
on the scraper
as well as the triggerCollector
.
For the sake of clarity, here comes the definition of the ResultHandler
interface:
@FunctionalInterface public interface ResultHandler { /** * Callback handler. * @param jobName name of the job (from config) * @param connectionName alias of the connection (<b>not</b> connection String) * @param results Results in the form alias to result value */ void handle(String jobName, String connectionName, Map<String, Object> results); }
Configuration using a JSON
or YAML
file
As an alternative to using the Java API, the Scraper Configuration can also be read from a JSON
or YAML
document.
Here come some examples:
JSON:
{ "sources": { "connectionName": "connectionString" }, "jobs": [ { "name": "jobName", "triggerConfig": (SCHEDULED,10000) "sources": [ "connectionName" ], "fields": { "a": "{address-a}", "b": "{address-b}" } } ] }
YAML:
--- sources: connectionName: connectionString jobs: - name: jobName triggerConfig: (SCHEDULED,10000) sources: - connectionName fields: a: {address-a} b: {address-b}
In both cases, you can create the ScraperConfiguration
with the following code:
ScraperConfiguration conf = ScraperConfiguration.fromFile("{path to the JSON or YAML file}", ScraperConfigurationTriggeredImpl.class);