Using Excel Visual Basic for Applications (VBA) to capture Company financials from a Web-site
Wishing everyone a very Happy New Year 2021 !
In this brief article, I will demonstrate how to use Excel VBA for capturing company financial information from a website. There are two main reasons why I am writing this article. One is that despite the emergence of platforms such as Hadoop for Big Data or Python for predictive analytics, most of the day-to-day work done in the BFSI sector is in plain MS Excel. The second reason is that most are unaware of the true potential of Excel and MS Office bundle in processing and analysis of large volumes of data. After reading this article, I hope that those working in the BFSI sector would be able to understand the capability of MS Office and Excel and how best it could be used in day-to-day working. So let’s get started!
For the uninitiated, VBA is a full-fledged programming language that comes packaged along with the MS Office product suite. Yes! It is not just a scripting language as some think. It is a complete programming language based on the syntax of the BASIC language.
The Basics!
I am going to try to avoid heavy technical jargon, but there are some things which cannot be explained without technical words. I will try to keep it as practical as possible.
The Microsoft Office Object Model
Before we begin, it is important to understand that Microsoft has followed an object oriented model in designing its MS Office suite of products which includes MS Word, Excel, Power-point, MS Access. What this practically means is that everything we work with in MS Word, or Excel can be thought of as an object. For instance, when it comes to Excel, we can think of the Excel “Application”, “Workbook”, “Worksheet” as objects and when we work within a “Worksheet”, we can think of a “Range” of cells, or a “Cell” itself as an object. Now all these are objects in Excel and each object has a set of properties or attributes and methods or functions. All objects within Excel are members or sub-members of the “Application” object.
The Microsoft Office Visual Basic Editor
MS Office provides a programming environment packaged along with Excel. To access this programming environment you need to first create an excel file called “stock_data_analysis” and save it as a macro enabled workbook with extension “.xlsm”. Open the empty file and use ALT+F11 on your laptop or PC and it will take you to Microsoft Visual Basic Editor. This is where you will be working with while writing Visual Basic code. Once you open this Window, you will see project explorer on the left side. In this, you will see a list of all main objects within the workbook, i.e. the workbook itself, the worksheets contained within and a module. You will need to right click on the folder Microsoft Excel Objects and select Insert ->Module and that will create a Module1 object. Double click on Module1 to start typing the code.
Now this is what your screen will look like:
The Website
For capturing the financials of a Company, we will be using the example of “Screener.in”, a website which displays detailed financial information about a Company on its very first page. As an example, you can try visiting the website and type in "Infosys" in the search box and hit Enter. The following link will open:
https://www.screener.in/company/INFY/consolidated/
"Consolidated" financials of Infosys Limited shall be displayed on the web page. Notice that "INFY" is the symbol of Infosys Limited on the stock exchanges. What this means is that if we insert the symbol of a listed Company such as TCS in place of INFY, we will get the consolidated financials of that Company e.g. TCS.
Recommended by LinkedIn
A snapshot of the webpage and URL are given below:
The Logic
We shall be writing a Visual Basic “module” that will take in the stock symbol as input and capture the financials displayed in the excel worksheet. If you notice on the above web-page, the financials are displayed one below the other in various tables. In HTML, tables begin with the <table> tag and the individual rows with <tr> and columns with <td>. To parse a table, we will need to look for these tags within the page code and capture the contents within these tags in the same order.
For this we need to add the WinHTTP services library in the VBA which has all the related objects and methods to help us parse the contents of a web-page displayed i.e. the URL mentioned above.
For this, please go into Tools -> References and put a tick mark next to Microsoft WinHTTP Services, as shown below in the picture.
Next put a tick mark next to Microsoft XML, V6.0 and click on OK.
The code
Copy the code in the below snippet as it is and paste it into the Module 1 in your Visual Basic editor. Refer the comments above each line of code to understand what it does. The statements beginning with a single quote are treated as comments in VBA and are for the programmer’s reference only. Everything else is executable code. Do not get inundated by the number of lines of the snippet as most of it is comments. Read it line by line and go through the comments to understand what each line does.
'We define a sub-routine "Capture_Financials_Screener" which takes in 'stockSymbol and stockName as inputs
Public Sub Capture_Financials_Screener(stockSymbol As String, stockName As String)
'In VBA Dim statements are used to define variables. Here we are defining a 'variable xmlHTTP of type Object. An object type is a generic type in VBA 'which is used when we use something called as late-binding in coding. Late-'binding is essentially when the actual type of object is defined during run-'time and not initially. We also define other variables as Objects which we 'shall be using later on to parse ‘HTML tables.
Dim xmlHttp As Object
Dim TR_col As Object, Tr As Object
Dim TD_col As Object, Td As Object
Dim row As Long, col As Long
Dim str_stock_symbol() As String
Dim rng As Range
‘Here we are setting the xmlHttp variable defined earlier to an object of 'MSXML2.XMLHTTP.6.0 type.
Set xmlHttp = CreateObject("MSXML2.XMLHTTP.6.0")
'Here we are inserting the stockSymbol variable which is an input to this 'sub-routine into the screener.in URL we have seen above. If the Company has 'consolidated financials, this URL will be used to request data from '“screener.in”, else the URL ‘defined below, over the “Goto Redo” statement 'shall be used. (Refer the fourth last line of the code.)
myURL = "https://www.screener.in/company/" & stockSymbol & "/consolidated/"
'Redo is a flag we have set for commencing the actual code for parsing the 'web-page.
Redo:
'This statement calls the Open function of the xmlHttp object and uses the 'GET call on the above URL. To call a function we just type the object name 'followed by a dot and the function name. A GET call is used to request data 'from a website. This is usually executed by the browser such as Chrome or IE 'when we type the URL into the web-browser. Here we are using it in our code 'in VBA to get the data from screener.in from the URL defined above.
xmlHttp.Open "GET", myURL, False
‘This statement sets the content type of the xmlHttp object to text/xml.
xmlHttp.setRequestHeader "Content-Type", "text/xml"
'This statement calls the send function of the xmlHttp object and sends the '“GET” along with the above ‘information of the xmlHttp object to the 'website.
xmlHttp.send
'Here we define and create another object, html for storing the contents 'which are received from the webpage ‘following the GET call submitted above.
Dim html As Object
Set html = CreateObject("htmlfile")
'Here we call the body.innerHTML property of the html object to capture the 'contents of the webpage. For that we use the responsetext property of the 'xmlHttp object which returns the text received from a web-site following 'the GET call.
html.body.innerHTML = xmlHttp.responseText
‘Here we define another object, tbl for looping through tables.
Dim tbl As Object
'the getElementsbyTagName function returns a collection of all elements in a 'web-page with the referred Tag Name. In this case it returns all the 'elements with <table> tag i.e. all tables on the web page of the html 'object. Note that we had earlier stored the text returned from the webpage 'in to html object.
Set Tables = html.getElementsByTagName("table")
'Here, the length function of collections object "Tables" is called to check 'if its length is zero. If yes, this means the webpage does not have any 'tables. Hence we exit the sub-routine using the "Exit Sub" statement after 'displaying a message box with a relevant message.
If Tables.Length = 0 Then
MsgBox "No data found for the particular stock symbol"
Exit Sub
End If
'Now if there are tables in the web-page we begin to parse the table contents 'and displaying the contents in the WorkSheet.
'First we define a variable Z = 1 which stands for the first row of the 'Worksheet. So we begin with the first row of the Worksheet.
Z = 1
‘Next we define a variable counter = 0 which is a counter for the individual tables of the Worksheet.
counter = 0
'Next we define a loop. We use object tb to loop through the collection of 'tables. This loop will run for each object tb in the collection of tables.
For Each tb In Tables
‘Increase the counter for tables by 1
counter = counter + 1
‘Next we check the counter number and display the Table Name at the beginning 'of each table in the Cell located at row Z and Column 1. We use the 'Thisworkbook object which denotes the current workbook in which we are 'working and the Sheets collection. The Sheets(“Sheet1”) is used to refer to 'sheet1 of the Sheets collection which is the name of the default first 'worksheet in excel. If your worksheet has some other name, replace Sheet1 'with that name. The ".Cells (Z, 1)" is used to refer to the cell at row Z 'and column 1. As the loop will run, Z shall be incremented in each loop by 1 'and hence the cells will also change accordingly. Thus, if the table counter 'is 1, it is quarterly performance, if it 2 it is Annual performance and so 'on.
If counter = 1 Then
ThisWorkbook.Sheets(“Sheet1”).Cells(Z, 1) = "Quarterly Performance"
Z = Z + 1
ElseIf counter = 2 Then
ThisWorkbook.Sheets(“Sheet1”).Cells(Z, 1) = "Annual Performance"
Z = Z + 1
ElseIf counter = 3 Then
ThisWorkbook.Sheets(“Sheet1”).Cells(Z, 1) = "Compounded Sales Growth"
Z = Z + 1
ElseIf counter = 4 Then
ThisWorkbook.Sheets(“Sheet1”).Cells(Z, 1) = "Compounded profit growth"
Z = Z + 1
ElseIf counter = 5 Then
ThisWorkbook.Sheets(“Sheet1”).Cells(Z, 1) = "Stock price CAGR"
Z = Z + 1
ElseIf counter = 6 Then
ThisWorkbook.Sheets(“Sheet1”).Cells(Z, 1) = "ROE"
Z = Z + 1
ElseIf counter = 7 Then
ThisWorkbook.Sheets(“Sheet1”).Cells(Z, 1) = "Balance-Sheet"
Z = Z + 1
ElseIf counter = 8 Then
ThisWorkbook.Sheets(“Sheet1”).Cells(Z, 11) = "Cash Flows"
Z = Z + 1
ElseIf counter = 9 Then
ThisWorkbook.Sheets(“Sheet1”).Cells(Z, 11) = "Ratios"
Z = Z + 1
ElseIf counter = 10 Then
ThisWorkbook.Sheets(“Sheet1”).Cells(Z, 11) = "Shareholding Pattern"
Z = Z + 1
End If
'Similar to tag name Table, now we target the tag <thead> which stands for 'table header. The explanation for this is similar to the above explanation 'for tables collection and looping through the tables collection. 'Effectively we are looping through all elements with the tag <thead>.
'Within this, we are looping through all rows which have tags <tr> and all 'columns which have the tag <th> in HTML.
Set hHead = tb.getElementsByTagName("thead")
For Each hh In hHead
Set hTr = hh.getElementsByTagName("tr")
'Y=1 resets the column to the first column after completion of each table 'row. This is important as otherwise the tables shall not be displayed one 'below the other
y = 1
For Each Tr In hTr
Set htd = Tr.getElementsByTagName("th")
'Y=1 resets the column to the first column after completion of each table 'row. This is important as otherwise the tables shall not be displayed one 'below the other
y = 1
For Each th In htd
'This is where we display the contents of the respective HTML tag object into 'the respective cell in the worksheet. The property innerText returns the 'text contained in ‘the HTML tag object. Mostly these will be financial 'numbers or period references. We display them in the same format as on the 'web page.
ThisWorkbook.Sheets(“Sheet1”).Cells(Z, y) = th.innerText
'Set the format of the cell as date as it is the header
'and has dates i.e. period of the financials
ThisWorkbook.Sheets(“Sheet1”).Cells(Z, y).NumberFormat = "MMM-YYYY"
'Set the borders of the cell to make it look like a table
ThisWorkbook.Sheets(“Sheet1”).Cells(Z, y).Borders.LineStyle = xlContinuous
'Increment y by 1 to move to the next column
y = y + 1
Next th
'Here we increment the row counter Z by 1 i.e. we move on to the next row of 'the worksheet for displaying the next HTML table row. This is an important 'step and if we don’t do this, the loop will keep over-writing. contents on 'the same worksheet row. And we don’t want that to happen.
Z = Z + 1
‘Take the next TR tag within the loop.
Next Tr
Exit For
‘Take the next TH tag within the loop.
Next hh
'Similar to tag <table>, now we target the tag <tbody> which stands for table 'body. The explanation ‘for this is similar to the above explanation for 'tables collection and looping through the tables collection. Effectively we 'are looping through all elements with the tag <tbody>. Within this, we are 'looping through all rows which have tags “<tr>” and all columns which have 'the tag “<td>” in HTML. The code is similar to the above code which captures 'the table header contents.
Set hBody = tb.getElementsByTagName("tbody")
For Each bb In hBody
Set hTr = bb.getElementsByTagName("tr")
For Each Tr In hTr
Set htd = Tr.getElementsByTagName("td")
y = 1
For Each Td In htd
ThisWorkbook.Sheets(“Sheet1”).Cells(Z, y).Value = Td.innerText
'Here we defined the cell format as number because this will have financial
'figures not dates.
ThisWorkbook.Sheets(“Sheet1”).Cells(Z, y).NumberFormat = "0.00"
'Define the cell borders to make it look like a table
ThisWorkbook.Sheets(“Sheet1”).Cells(Z, y).Borders.LineStyle = xlContinuous
'increment y by 1 to move to the next column
y = y + 1
Next Td
'Increment Z by 1 to move to the next worksheet row after <tr> row has ended.
Z = Z + 1
Next Tr
Exit For
Next bb
'We increment the Z by 1 i.e. we move on to the new row of the worksheet.
Z = Z + 1
Next tb
'here we check whether any data has been captured. If not, we assume that 'since consolidated data is ‘not captured, there is no consolidated financial 'data for the Company and instead look for standalone ‘data with a new URL. 'We define the new URL for standalone financials and go back to Redo using 'the Go to Statement. We have defined the Redo Flag above and this is what it 'is used for
If ThisWorkbook.Sheets(“Sheet1”).Cells(3, 12) = "" And ThisWorkbook.Sheets(“Sheet1”).Cells(18, 12) = "" Then
‘The URL to be used with the Company has only standalone financials
myURL = "https://www.screener.in/company/" & stockSymbol & "/"
GoTo Redo:
End If
End Sub
'This is the main routine Stock_Price_Alert which calls the above routine
Public Sub Stock_Price_Alert()
'Here we call the sub-routine Capture_Financials_Screener and pass
'INFY and Infosys Limited as inputs or parameters to the routine
Call Module1.Capture_Financials_Screener(“INFY”, “Infosys Limited”)
End Sub
Concluding remarks
Upon executing the above macro by clicking on Run->Run Sub or by using F5, you should have the financial data of Infosys Limited captured in the Sheet1 of your Workbook.
If you have understood what we have done above, you can use this kind of logic and code to capture information from any web-site of your choice. Do however refer to the terms of the website/service to check whether it has any negative covenants with regard to web-scraping or automation of data capturing. In other words, check for legality before doing anything like this.
I have tried packing the most explanation into as brief an article as possible while keeping the concepts practical and less technical. Do let me know your feedback on whether this article was useful in not just capturing Company financials, with “Copy and Paste” but also in conceptually understanding the Excel VBA environment and more importantly the capabilities of Excel VBA in automating routine tasks and its utility in financial analysis.
If you need any help with the code or understanding any part of it, or if you have a specific feedback, reach out to me at prasad.athalye@gmail.com and I’ll be happy to help.
Functional Head - Retail Lending Controls, Bank Control Unit, Bandhan Bank
3yThanks to David Sharpe for pointing out two small left over errors in the code. https://www.linkedin.com/in/david-sharpe-9104a6128 An unnecessary "End if" and "End With" (both removed)
Bhartiya from Bharat 🇮🇳
3yVery insightful article.