0
0
mirror of https://github.com/RSS-Bridge/rss-bridge.git synced 2024-11-23 23:49:34 +00:00
RSS-Bridge_rss-bridge/docs/05_Bridge_API/04_WebDriverAbstract.md
hleskien 8e8028b786
Adopt WebDriverAbstract as a solution for active (JavaScript) websites (#3971)
* first working version

---------

Co-authored-by: Dag <me@dvikan.no>
2024-02-10 04:42:22 +01:00

83 lines
2.6 KiB
Markdown

`WebDriverAbstract` extends [`BridgeAbstract`](./02_BridgeAbstract.md) and adds functionality for generating feeds
from active websites that use XMLHttpRequest (XHR) to load content and / or JavaScript to
modify content.
It highly depends on the php-webdriver library which offers Selenium WebDriver bindings for PHP.
- https://github.com/php-webdriver/php-webdriver (Project Repository)
- https://php-webdriver.github.io/php-webdriver/latest/ (API)
Please note that this class is intended as a solution for websites _that cannot be covered
by the other classes_. The WebDriver starts a browser and is therefore very resource-intensive.
# Configuration
You need a running WebDriver to use bridges that depend on `WebDriverAbstract`.
The easiest way is to start the Selenium server from the project of the same name:
```
docker run -d -p 4444:4444 --shm-size="2g" docker.io/selenium/standalone-chrome:latest
```
- https://github.com/SeleniumHQ/docker-selenium
With these parameters only one browser window can be started at a time.
On a multi-user site, Selenium Grid should be used
and the number of sessions should be adjusted to the number of processor cores.
Finally, the `config.ini.php` file must be adjusted so that the WebDriver
can find the Selenium server:
```
[webdriver]
selenium_server_url = "http://localhost:4444"
```
# Development
While you are programming a new bridge, it is easier to start a local WebDriver because then you can see what is happening and where the errors are. I've also had good experience recording the process with a screen video to find any timing problems.
```
chromedriver --port=4444
```
- https://chromedriver.chromium.org/
If you start rss-bridge from a container, then Chrome driver is only accessible
if you call it with the `--allowed-ips` option so that it binds to all network interfaces.
```
chromedriver --port=4444 --allowed-ips=192.168.1.42
```
The **most important rule** is that after an event such as loading the web page
or pressing a button, you often have to explicitly wait for the desired elements to appear.
A simple example is the bridge `ScalableCapitalBlogBridge.php`.
A more complex and relatively complete example is the bridge `GULPProjekteBridge.php`.
# Template
Use this template to create your own bridge.
```PHP
<?php
class MyBridge extends WebDriverAbstract
{
const NAME = 'My Bridge';
const URI = 'https://www.example.org';
const DESCRIPTION = 'Further description';
const MAINTAINER = 'your name';
public function collectData()
{
parent::collectData();
try {
// TODO
} finally {
$this->cleanUp();
}
}
}
```