How to download Historical data from NSE using Python

Historical Data is the most important thing if You’re backtesting. NSE provides Historical Data for free in various time frames. To scrap that data, We have designed a function named equity_history() in NSEPython Library. The documentation follows –

Syntax: 

				
					equity_history(symbol,series,start_date,end_date)
				
			

Program Structure:

				
					
def equity_history(symbol,series,start_date,end_date):
    payload = nsefetch("https://www.nseindia.com/api/historical/cm/equity?symbol="+symbol+"&series=[%22"+series+"%22]&from="+start_date+"&to="+end_date+"")
    return pd.DataFrame.from_records(payload["data"])

				
			

Example:

				
					symbol = "SBIN"
series = "EQ"
start_date = "08-06-2021"
end_date ="14-06-2021"
print(equity_history(symbol,series,start_date,end_date))
				
			

Problem

It will return a Pandas Dataframe as output. Anyways it is huge, So We are not pasting here. But there is a huge limitation – 

  • Error or Empty data if You search beyond 365 days.
  • It will return only 50 days of data even if you ask for the last 90 days and discard the rest!

So, How to improvise this function?

Solution

Without further ado, We shall paste the code first and explain side by side. Here, We have copied the Previous code to a different function. It is called Virgin. We shall use this as a base.

				
					from nsepython import *
logging.basicConfig(level=logging.DEBUG)

def equity_history_virgin(symbol,series,start_date,end_date):
    url="https://www.nseindia.com/api/historical/cm/equity?symbol="+symbol+"&series=[%22"+series+"%22]&from="+str(start_date)+"&to="+str(end_date)+""
    payload = nsefetch(url)
    return pd.DataFrame.from_records(payload["data"])

				
			

Constructing the CatterPillar Function

Our hands are tied to the output of 40 days only because – “It will return only 50 days of data even if you ask for the last 90 days and discard the rest!”. So, We need to divide our input into chunks of requests of 40 days. After we get our inputs in chunks, We will stitch it.

  • So, if the number of days turns out to be 314. It will do 314/40=7.85. 
  • Now it will run the function total of 8 times.314 = (40*7) + 34. 7 times it will fetch 40 days.
  • 8th time it will do 34 days.  
You may ask why I am using “40” instead of “50”. It is just to be for safe by not being strict.

Part I

Now the agenda is – 

  • We are getting the input in text. So it is being converted to Datetime object from String.
  • Then We shall calculate the difference between the number of days.
  •  Then, We shall design the caterpillar function as mentioned above.
				
					#We are getting the input in text. So it is being converted to Datetime object from String. 
start_date = datetime.datetime.strptime(start_date, "%d-%m-%Y")
end_date = datetime.datetime.strptime(end_date, "%d-%m-%Y")
logging.info("Starting Date: "+str(start_date))
logging.info("Ending Date: "+str(end_date))

#We are calculating the difference between the days
diff = end_date-start_date

logging.info("Total Number of Days: "+str(diff.days))
logging.info("Total FOR Loops in the program: "+str(int(diff.days/40)))
logging.info("Remainder Loop: " + str(diff.days-(int(diff.days/40)*40)))
				
			

Part II

Now, We will run the loop –

  • Each time We are getting an output, We are appending it.
  • After the loop is completed, We run the function with the remainder set of days. Then, append that Dataframe too. That’s why the reference of "Caterpillar",
  • The notable part here is – The output comes in descending order. So, We need to reverse the entire pandas Dataframe.
    This is why total.iloc[::-1] is used.
  • The index of the output also ends up in a mess. So, We are resetting the entire index too by using .reset_index(drop=True)`
				
					total=pd.DataFrame()
for i in range (0,int(diff.days/40)):

    temp_date = (start_date+datetime.timedelta(days=(40))).strftime("%d-%m-%Y")
    start_date = datetime.datetime.strftime(start_date, "%d-%m-%Y")

    logging.info("Loop = "+str(i))
    logging.info("====")
    logging.info("Starting Date: "+str(start_date))
    logging.info("Ending Date: "+str(temp_date))
    logging.info("====")

    total=total.append(equity_history_virgin(symbol,series,start_date,temp_date))

    logging.info("Length of the Table: "+ str(len(total)))

    #Preparation for the next loop
    start_date = datetime.datetime.strptime(temp_date, "%d-%m-%Y")


start_date = datetime.datetime.strftime(start_date, "%d-%m-%Y")
end_date = datetime.datetime.strftime(end_date, "%d-%m-%Y")

logging.info("End Loop")
logging.info("====")
logging.info("Starting Date: "+str(start_date))
logging.info("Ending Date: "+str(end_date))
logging.info("====")

total=total.append(equity_history_virgin(symbol,series,start_date,end_date))

logging.info("Finale")
logging.info("Length of the Total Dataset: "+ str(len(total)))
payload = total.iloc[::-1].reset_index(drop=True)
print(payload)
				
			

Stitching it Up

Anyways We are adding these following variables and update the above two parts into function definition of 

def equity_history(symbol,series,start_date,end_date):

Then, Updated the NSEPython Library. That’s the perk of an open-source library right? We can modify any function instantly. So, All we have to run is –  

				
					from nsepython import *
logging.basicConfig(level=logging.INFO)

symbol = "SBIN"
series = "EQ"
start_date = "08-01-2021"
end_date ="14-06-2021"
print(equity_history(symbol,series,start_date,end_date))

				
			

Please note that We have used logging.basicConfig(level=logging.INFO). Otherwise, It will skip the output of the logger functions.

Anyways, here goes the output –

				
					INFO:root:Starting Date: 2021-01-08 00:00:00
INFO:root:Ending Date: 2021-06-14 00:00:00
INFO:root:Total Number of Days: 157
INFO:root:Total FOR Loops in the program: 3
INFO:root:Remainder Loop: 37
INFO:root:Loop = 0
INFO:root:====
INFO:root:Starting Date: 08-01-2021
INFO:root:Ending Date: 2021-06-14 00:00:00
INFO:root:====
INFO:root:Length of the Table: 28
INFO:root:Loop = 1
INFO:root:====
INFO:root:Starting Date: 17-02-2021
INFO:root:Ending Date: 2021-06-14 00:00:00
INFO:root:====
INFO:root:Length of the Table: 55
INFO:root:Loop = 2
INFO:root:====
INFO:root:Starting Date: 29-03-2021
INFO:root:Ending Date: 2021-06-14 00:00:00
INFO:root:====
INFO:root:Length of the Table: 81
INFO:root:End Loop
INFO:root:====
INFO:root:Starting Date: 08-05-2021
INFO:root:Ending Date: 14-06-2021
INFO:root:====
INFO:root:Finale
INFO:root:Length of the Total Dataset: 106
                          _id CH_SYMBOL CH_SERIES  ...    VWAP   mTIMESTAMP   CA
0    6099206528330700080ff5a2      SBIN        EQ  ...  362.87  10-May-2021  NaN
1    609a71fe45df9f0008b768c2      SBIN        EQ  ...  362.80  11-May-2021  NaN
2    609bc37ec132690009effd5e      SBIN        EQ  ...  368.18  12-May-2021  NaN
3    609e667e885aee00088f452d      SBIN        EQ  ...  365.59  14-May-2021  NaN
4    60a25afd4e61470008bbebf9      SBIN        EQ  ...  376.90  17-May-2021  NaN
..                        ...       ...       ...  ...     ...          ...  ...
101  60251c628160710008fa5699      SBIN        EQ  ...  391.61  11-Feb-2021  NaN
102  60266de0e854190008018823      SBIN        EQ  ...  392.34  12-Feb-2021  NaN
103  602a6261e85419000827e6de      SBIN        EQ  ...  403.08  15-Feb-2021  NaN
104  602bb3db05368f0008dd72ad      SBIN        EQ  ...  407.39  16-Feb-2021  NaN
105  602d057ee89593000836a7a8      SBIN        EQ  ...  410.54  17-Feb-2021  NaN

[106 rows x 24 columns]
[Finished in 1.788s]
				
			

We will keep making the NSEPython Library better but so can you!

Join The Conversation?