I am trying to use AngleSharp to crawl a webpage on my localhost. The page is generated using Angular js dynamically. I am using AngleSharp to get the page. Also using AngleSharp Scripting library to run Javascript. Below is my code for POC purpose. I am unable to figure out where can I find the HTML of the page after Javascript rendering is complete.
t.Result.Source.Text gives me the page source of the webpage. Where can I find the Source after javascript has finished rendering? I am even unable to figure out if the javascript ran or not !
static void Main(string[] args)
{
Task<IDocument> t = StartCrawl();
t.Wait();
string textContent = t.Result.Source.Text;
Console.ReadKey();
}
private static async Task<IDocument> StartCrawl()
{
var config = Configuration.Default
.WithDefaultLoader()
.WithCss()
.WithJavaScript();
var context = BrowsingContext.New(config);
var document = await context.OpenAsync("http://localhost:8000/#!/phones");
return document;
}
The view source of the url gives me this. How can I run all the javascripts on the page after page load. I can see 16 scripts in the document.Scripts property.
<!doctype html>
<html lang="en" ng-app="phonecatApp">
<head>
<meta charset="utf-8">
<title>Google Phone Gallery</title>
<link rel="stylesheet" href="bower_components/bootstrap/dist/css/bootstrap.css" />
<link rel="stylesheet" href="app.css" />
<link rel="stylesheet" href="app.animations.css" />
<script src="bower_components/jquery/dist/jquery.js"></script>
<script src="bower_components/angular/angular.js"></script>
<script src="bower_components/angular-animate/angular-animate.js"></script>
<script src="bower_components/angular-resource/angular-resource.js"></script>
<script src="bower_components/angular-route/angular-route.js"></script>
<script src="app.module.js"></script>
<script src="app.config.js"></script>
<script src="app.animations.js"></script>
<script src="core/core.module.js"></script>
<script src="core/checkmark/checkmark.filter.js"></script>
<script src="core/phone/phone.module.js"></script>
<script src="core/phone/phone.service.js"></script>
<script src="phone-list/phone-list.module.js"></script>
<script src="phone-list/phone-list.component.js"></script>
<script src="phone-detail/phone-detail.module.js"></script>
<script src="phone-detail/phone-detail.component.js"></script>
</head>
<body>
<div class="view-container">
<div ng-view class="view-frame"></div>
</div>
</body>
</html>
In AngleSharp (like in a browser) there is no notion of source after JS did something. You can look at the originally transferred source, but I guess that's not what you want.
If you want to see the string serialization of the DOM at a particular time (e.g., after some DOM manipulation by a JS script) then just do:
var currentSource = document.ToHtml(); // current serialization of the DOM
Note that this will represent your DOM in HTML (text) form.
What you did gives you the original source code:
var textContent = t.Result.Source.Text; // will always contain the original source