AI For Trading:Sector Neutral Exercise (62)
Sector Neutral
Install packages
import sys
!{sys.executable} -m pip install -r requirements.txt
Collecting alphalens==0.3.2 (from -r requirements.txt (line 1))
[?25l Downloading https://files.pythonhosted.org/packages/a5/dc/2f9cd107d0d4cf6223d37d81ddfbbdbf0d703d03669b83810fa6b97f32e5/alphalens-0.3.2.tar.gz (18.9MB)
[K 100% |████████████████████████████████| 18.9MB 172kB/s eta 0:00:01
[?25hCollecting colour==0.1.5 (from -r requirements.txt (line 2))
Downloading https://files.pythonhosted.org/packages/74/46/e81907704ab203206769dee1385dc77e1407576ff8f50a0681d0a6b541be/colour-0.1.5-py2.py3-none-any.whl
Collecting tqdm==4.19.5 (from -r requirements.txt (line 15))
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
import cvxpy as cvx
import numpy as np
import pandas as pd
import time
import os
import quiz_helper
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (14, 8)
following zipline bundle documentation
http://www.zipline.io/bundles.html#ingesting-data-from-csv-files
data bundle
import os
import quiz_helper
from zipline.data import bundles
os.environ['ZIPLINE_ROOT'] = os.path.join(os.getcwd(), '..', '..','data','module_4_quizzes_eod')
ingest_func = bundles.csvdir.csvdir_equities(['daily'], quiz_helper.EOD_BUNDLE_NAME)
bundles.register(quiz_helper.EOD_BUNDLE_NAME, ingest_func)
print('Data Registered')
Data Registered
Build pipeline engine
from zipline.pipeline import Pipeline
from zipline.pipeline.factors import AverageDollarVolume
from zipline.utils.calendars import get_calendar
universe = AverageDollarVolume(window_length=120).top(500)
trading_calendar = get_calendar('NYSE')
bundle_data = bundles.load(quiz_helper.EOD_BUNDLE_NAME)
engine = quiz_helper.build_pipeline_engine(bundle_data, trading_calendar)
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
~/anaconda3/envs/env_zipline/lib/python3.5/site-packages/zipline/data/bundles/core.py in most_recent_data(bundle_name, timestamp, environ)
479 candidates = os.listdir(
--> 480 pth.data_path([bundle_name], environ=environ),
481 )
FileNotFoundError: [Errno 2] No such file or directory: '/Users/kaiyiwang/Code/AI/UDA/AITND/Alpha_Factor/../../data/module_4_quizzes_eod/data/m4-quiz-eod-quotemedia'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-7-5e2d7c96a570> in <module>()
5 universe = AverageDollarVolume(window_length=120).top(500)
6 trading_calendar = get_calendar('NYSE')
----> 7 bundle_data = bundles.load(quiz_helper.EOD_BUNDLE_NAME)
8 engine = quiz_helper.build_pipeline_engine(bundle_data, trading_calendar)
~/anaconda3/envs/env_zipline/lib/python3.5/site-packages/zipline/data/bundles/core.py in load(name, environ, timestamp)
519 if timestamp is None:
520 timestamp = pd.Timestamp.utcnow()
--> 521 timestr = most_recent_data(name, timestamp, environ=environ)
522 return BundleData(
523 asset_finder=AssetFinder(
~/anaconda3/envs/env_zipline/lib/python3.5/site-packages/zipline/data/bundles/core.py in most_recent_data(bundle_name, timestamp, environ)
495 'maybe you need to run: $ zipline ingest -b {bundle}'.format(
496 bundle=bundle_name,
--> 497 timestamp=timestamp,
498 ),
499 )
ValueError: no data for bundle 'm4-quiz-eod-quotemedia' on or before 2019-05-03 03:00:25.193775+00:00
maybe you need to run: $ zipline ingest -b m4-quiz-eod-quotemedia
View Data¶
With the pipeline engine built, let's get the stocks at the end of the period in the universe we're using. We'll use these tickers to generate the returns data for the our risk model.
universe_end_date = pd.Timestamp('2016-01-05', tz='UTC')
universe_tickers = engine\
.run_pipeline(
Pipeline(screen=universe),
universe_end_date,
universe_end_date)\
.index.get_level_values(1)\
.values.tolist()
universe_tickersGet Returns data
from zipline.data.data_portal import DataPortal
data_portal = DataPortal(
bundle_data.asset_finder,
trading_calendar=trading_calendar,
first_trading_day=bundle_data.equity_daily_bar_reader.first_trading_day,
equity_minute_reader=None,
equity_daily_reader=bundle_data.equity_daily_bar_reader,
adjustment_reader=bundle_data.adjustment_reader)
Get pricing data helper function
def get_pricing(data_portal, trading_calendar, assets, start_date, end_date, field='close'):
end_dt = pd.Timestamp(end_date.strftime('%Y-%m-%d'), tz='UTC', offset='C')
start_dt = pd.Timestamp(start_date.strftime('%Y-%m-%d'), tz='UTC', offset='C')
end_loc = trading_calendar.closes.index.get_loc(end_dt)
start_loc = trading_calendar.closes.index.get_loc(start_dt)
return data_portal.get_history_window(
assets=assets,
end_dt=end_dt,
bar_count=end_loc - start_loc,
frequency='1d',
field=field,
data_frequency='daily')
get pricing data into a dataframe
returns_df = \
get_pricing(
data_portal,
trading_calendar,
universe_tickers,
universe_end_date - pd.DateOffset(years=5),
universe_end_date)\
.pct_change()[1:].fillna(0) #convert prices into returns
returns_df
Sector data helper function
We'll create an object for you, which defines a sector for each stock. The sectors are represented by integers. We inherit from the Classifier class. Documentation for Classifier, and the source code for Classifier
from zipline.pipeline.classifiers import Classifier
from zipline.utils.numpy_utils import int64_dtype
class Sector(Classifier):
dtype = int64_dtype
window_length = 0
inputs = ()
missing_value = -1
def __init__(self):
self.data = np.load('../../data/project_4_sector/data.npy')
def _compute(self, arrays, dates, assets, mask):
return np.where(
mask,
self.data[assets],
self.missing_value,
)
sector = Sector()
sector
len(sector.data)
sector.data
Quiz 1
How many unique sectors are in the sector variable?
Answer 1
There are 11 sector categories.
-1 represents missing values. There are categories 0 to 10
print(f"set of unique categories: {set(sector.data)}")
Create an alpha factor based on momentum
We want to calculate the one-year return.
In other words, get the close price of today, minus the close price of 252 trading days ago, and divide by that price from 252 days ago.
$1YearReturnt = \frac{price{t} - price{t-252}}{price{t-252}}$
from zipline.pipeline.factors import Returns
We'll use 2 years of data to calculate the factor
Note: Going back 2 years falls on a day when the market is closed. Pipeline package doesn't handle start or end dates that don't fall on days when the market is open. To fix this, we went back 2 extra days to fall on the next day when the market is open.
factor_start_date = universe_end_date - pd.DateOffset(years=2, days=2)
factor_start_date
## 1 year returns can be the basis for an alpha factor
p1 = Pipeline(screen=universe)
rets1 = Returns(window_length=252, mask=universe)
p1.add(rets1,"1YearReturns")
df1 = engine.run_pipeline(p1, factor_start_date, universe_end_date)
#graphviz lets us visualize the pipeline
import graphviz
p1.show_graph(format='png')
View the data of the factor
df1.head()
Explore the demean function
The Returns class inherits from zipline.pipeline.factors.factor.
The documentation for demean is located here, and is also pasted below:
demean(mask=sentinel('NotSpecified'), groupby=sentinel('NotSpecified'))[source]
Construct a Factor that computes self and subtracts the mean from row of the result.
If mask is supplied, ignore values where mask returns False when computing row means, and output NaN anywhere the mask is False.
If groupby is supplied, compute by partitioning each row based on the values produced by groupby, de-meaning the partitioned arrays, and stitching the sub-results back together.
Parameters:
mask (zipline.pipeline.Filter, optional) – A Filter defining values to ignore when computing means.
groupby (zipline.pipeline.Classifier, optional) – A classifier defining partitions over which to compute means.
Quiz 2
By looking at the documentation, and then the source code for demean, what are two parameters for this function? Which one or ones would you call if you wanted to demean by sector and wish to demean for all values in the chosen universe?
The source code has useful comments to help you answer this question.
Answer 2
We would use the groupby parameter, and we don't need to use the mask parameter, since we are not going to exclude any of the stocks in the universe from the demean calculation.
Quiz 3
Turn 1 year returns into an alpha factor
We can do some processing to convert our signal (1 year return) into an alpha factor. One step is to demean by sector.
- demean
For each stock, we want to take the average return of stocks that are in the same sector, and then remove this from the return of each individual stock.
Answer 3
#TODO
# create a pipeline called p2
p2 = Pipeline(screen=universe)
# create a factor of one year returns, deman by sector
factor_demean_by_sector = (
Returns(window_length=252, mask=universe).
demean(groupby=Sector()) #we use the custom Sector class that we reviewed earlier
)
# add the factor to the p2 pipeline
p2.add(factor_demean_by_sector, 'Momentum_1YR_demean_by_sector')
visualize the second pipeline
p2.show_graph(format='png')
Quiz 4
How does this pipeline compare with the first pipeline that we created earlier?
Answer 4
The second pipeline now adds sector information in the GroupedRowTransform('demean') step.
run pipeline and view the factor data
df2 = engine.run_pipeline(p2, factor_start_date, universe_end_date)
df2.head()
为者常成,行者常至
自由转载-非商用-非衍生-保持署名(创意共享3.0许可证)