You can translate the question and the replies:

Manipulate a websites HTML before scrapping

Hello, There is a link on a page that opens to a new window using a target="_blank" tag. I am wondering if there is a way to remove this tag from the page so that it will open in the same window. URL: http://www.michigan.gov/mdhhs/0%2c5885%2c7-339-71551_2945_42542_42543_42546_42551-16459--%2c00.html The hyperlink with the Target _blank is List of Sanctioned Providers (XLS)
User
19-01-2018 15:57:13 -0500

1 Answer

Hi, For this scenario, I would create a navigation sequence in the ITPilot wrapper generation tool by doing the following steps: 1. Create a new wrapper and add a Sequence component that navigates to the primary URL. 1. Drag an Extractor component onto the workspace and link it after the Sequence component. 1. In the Extractor, select the 'Assign Examples' and get a new example field with the value of the 'href' tag. 1. Once the field and value of the field are obtained, use another sequence component to navigate to this link. 1. Finally, save the extractor component's configuration to test the wrapper. As the target=_blank is present in the source of the HTML file, which will not be able to modify the source data using ITPilot. For more information, you can have a look at the sections [Web Browsing Automation](https://community.denodo.com/docs/html/browse/6.0/itpilot/generation_environment/generation_environment_tools_-_part_i/web_browsing_automation/web_browsing_automation) and [Assigning the First Example](https://community.denodo.com/docs/html/browse/6.0/itpilot/generation_environment/generation_environment_tools_-_part_i/configuration_of_the_extractor_component/defining_the_structure_of_the_data_and_assigning_examples#assigning-the-first-example) of ITPilot. Hope this helps!
Denodo Team
30-01-2018 00:39:02 -0500
You must sign in to add an answer. If you do not have an account, you can register here