Zephyrnet Logo

How to Easily Fetch Binance Historical Trades Using Python

Date:

The script will use the following arguments:

  • symbol: The symbol of the trading pair, defined by Binance. It can be queried here or it may be copied from the URL of the Binance web app, excluding the ‘_’ character.
Remove the ‘_’ from the last part of the URL and you get the symbol
  • starting_date and ending_date: Self-explanatory. the expected format is mm/dd/yyyy , or, in Python slang, %m/%d/%Y

To get the arguments, we’ll use the built-in sys (nothing too fancy around here), and to parse the date, we will be using the datetime library.

symbol = sys.argv[1]
starting_date = datetime.strptime(sys.argv[2], '%m/%d/%Y')
ending_date = datetime.strptime(sys.argv[3], '%m/%d/%Y') + timedelta(days=1) - timedelta(microseconds=1)

We are adding one day and subtracting one microsecond so that the ending_date time portion is always at 23:59:59.999 , making it more practical to get same-day intervals.

With Binance’s API, using the aggTrades endpoint we can get at most 1000 trades in one request, and if we use start and end parameters, they can be at most 1 hour apart. After some failures, by fetching using time intervals (at some point or another the liquidity would go crazy and I would lose some precious trades), I decided to try the from_id strategy.

The aggTrades endpoint is chosen because it returns the compressed trades, in that way we won’t lose any precious information.

Get compressed, aggregate trades. Trades that fill at the time, from the same order, with the same price will have the quantity aggregated.

The from_id strategy goes like this: we are going to get the first trade of the starting_date by sending date intervals to the endpoint. After that, we will fetch 1000 trades starting by the first fetched trade ID. Then, we will check if the last trade happened after our ending_date. If so, we have gone through all the time period and we can save the results to file. Otherwise, we will update our from_id variable to get the last trade ID and start the loop all over again.

Ugh, enough talking, let’s code.

new_ending_date = from_date + timedelta(seconds=60)
r = requests.get('https://api.binance.com/api/v3/aggTrades',
params = {
"symbol" : symbol,
"startTime": get_unix_ms_from_date(from_date),
"endTime": get_unix_ms_from_date(new_ending_date)
})
response = r.json()
if len(response) > 0:
return response[0]['a']
else:
raise Exception('no trades found')

First, we create a new_end_date . That’s because we are using the aggTrades by passing a startTimeand an endTime parameter. For now, we only need to know the first trade ID of the period, so we are adding 60 seconds to the period. In low liquidity pairs, this parameter can be changed because there is no guarantee that a trade occurred in the first minute of the day that is requested.

Then, parse the date using our helper function convert it to a Unix millisecond representation, by using the calendar.timegm function. The timegm function is preferred because it keeps the date in UTC.

def get_unix_ms_from_date(date):
return int(calendar.timegm(date.timetuple()) * 1000 + date.microsecond/1000)

The request’s response is a list of trade objects sorted by date, with the following format:

[
{
"a": 26129, // Aggregate tradeId
"p": "0.01633102", // Price
"q": "4.70443515", // Quantity
"f": 27781, // First tradeId
"l": 27781, // Last tradeId
"T": 1498793709153, // Timestamp
"m": true, // Was the buyer the maker?
"M": true // Was the trade the best price match?
}
]

So, as we need the first trade id, we will be returning the response[0]["a"] value.

Now that we have the first trade ID, we can fetch trades 1000 at a time, until we reach our ending_date . The following code will be called inside our main loop. It will perform our request using the from_id parameter, ditching the startDate and endDate parameters.

def get_trades(symbol, from_id):
r = requests.get("https://api.binance.com/api/v3/aggTrades",
params = {
"symbol": symbol,
"limit": 1000,
"fromId": from_id
})
return r.json()

And now, our main loop, that will perform the requests and create our DataFrame.

from_id = get_first_trade_id_from_start_date(symbol, from_date)
current_time = 0
df = pd.DataFrame()
while current_time < get_unix_ms_from_date(to_date):
trades = get_trades(symbol, from_id)
from_id = trades[-1]['a']
current_time = trades[-1]['T']
print(f'fetched {len(trades)} trades from id {from_id} @ {datetime.utcfromtimestamp(current_time/1000.0)}')
df = pd.concat([df, pd.DataFrame(trades)])
#dont exceed request limits
time.sleep(0.5)

So, we check if the current_time that contains the date of the latest trade fetched is greater than our to_date , and if so, we:

  • fetch the trades using the from_id parameter;
  • update the from_id and current_time parameters, both with information from the latest trade fetched;
  • print a nice debug message;
  • pd.concat the trades fetched with the previous trades in our DataFrame;
  • and sleep a little, so that Binance won’t give us an ugly 429 HTTP response.

After assembling our DataFrame, we need to perform a simple data cleaning. We will remove the duplicates, and trim the trades that happened after our to_date (we have that “problem” because we’re fetching in chunks of 1000 trades, so it’s expected that we get some trades executed after our target end date).

We can encapsulate our “trim” functionality:

def trim(df, to_date):
return df[df['T'] <= get_unix_ms_from_date(to_date)]

And perform our data cleaning:

df.drop_duplicates(subset='a', inplace=True)
df = trim(df, to_date)

Now, we can save it to file using the to_csv method:

filename = f'binance__{symbol}__trades__from__{sys.argv[2].replace("/", "_")}__to__{sys.argv[3].replace("/", "_")}.csv'
df.to_csv(filename)

We can also use other data storage mechanisms, such as Arctic.

It’s important that we can trust our data when working with trading strategies. We can easily do that with the fetched trade data by applying the following for verification:

df = pd.read_csv(file_name)
values = df.to_numpy()
last_id = values[0][1]
for row in values[1:]:
trade_id = row[1]
if last_id + 1 != trade_id:
print('last_id', last_id)
print('trade_id', trade_id)
print('inconsistent data')
exit()
last_id = trade_id
print('data is OK!')

In the snippet, we convert our DataFrame to a Numpy Array and iterate row by row, checking if the trade ID is incremented by 1 each row.

Binance trade IDs are numbered incrementally and are created for each symbol, so it’s really easy to verify if your data is correct.

The first step to create a successful trading strategy is to have the right data.

My Algotrading series is a work in progress, so I welcome any feedback or suggestion you leave me in the comment section. You can check out the full code of this tiny tutorial in my GitHub repository.

I hope you enjoyed reading this post. Thank you for your time.

Take care and keep coding!

Source: https://medium.com/better-programming/how-to-easily-fetch-your-binance-historical-trades-using-python-174a6569cebd?source=rss——-8—————–cryptocurrency

spot_img

Latest Intelligence

spot_img

Chat with us

Hi there! How can I help you?