CSCE 824 – Secure Database Systems
Spring 2019
Lecture Notes
Jan. 15. Introduction and Relational databases
overview
Review:
CSCE 520 lecture notes, https://cse.sc.edu/~farkas/csce520-2015/csce520.htm
Jan. 17. Overview of
Relational Data Models and Distributed Databases (slides)
Reading:
1.
A. El Abbadi and S. Toueg. 1989.
Maintaining availability in partitioned replicated databases. ACM Trans.
Database Syst. 14, 2 (June 1989), 264-290., https://dl.acm.org/citation.cfm?id=63501
Interesting Reading:
2.
Uwe Röhm, Michael J. Cahill, Alan Fekete, Hyungsoo
Jung, Seung Woo Baek, and Mathew Rodley.
2013. Robust snapshot replication. In Proceedings of the Twenty-Fourth
Australasian Database Conference - Volume 137 (ADC '13), Hua Wang and Rui Zhang
(Eds.), Vol. 137. Australian Computer Society, Inc., Darlinghurst, Australia,
Australia, 81-91. https://dl.acm.org/citation.cfm?id=2525425
3.
Tudor-Ioan Salomie, Ionut
Emanuel Subasu, Jana Giceva,
and Gustavo Alonso. 2011. Database engines on multicores, why parallelize when
you can distribute?. In Proceedings of the sixth
conference on Computer systems (EuroSys '11). ACM,
New York, NY, USA, 17-30. https://dl.acm.org/citation.cfm?id=1966448
Jan. 22. Big Data
Analytics, HADOOP Architecture
Reading:
1.
HDFS Architecture
Guide, https://hadoop.apache.org/docs/r1.2.1/hdfs_design.pdf
2.
Hive –
Introduction, https://www.tutorialspoint.com/hive/hive_introduction.htm
Jan. 24-29. Security Primer (slides)
Reading:
CSCE 522 lecture notes, https://cse.sc.edu/~farkas/csce522/csce522.htm
Jan. 31.
Access Control Models (slides)
Reading:
1.
S. De
Capitani di Vimercati, P. Samarati, S. Jajodia: Policies, Models, and Languages for
Access Control, in Databases in Networked Information Systems, Volume 3433 of
the series Lecture Notes in Computer Science pp 225-237, http://spdp.di.unimi.it/papers/2005-DNIS.pdf
Febr. 5. Properties of
Access Control Models (slides)
1.
Charles Morisset and Nicola Zannone. 2014. Reduction of access
control decisions. In Proceedings of the 19th ACM symposium on Access control
models and technologies (SACMAT '14). ACM, New York, NY, USA, 53-62. https://dl.acm.org/citation.cfm?id=2613106
Febr. 7. The Inference Problem: General Inference
Problem (slides), web inferences (slides3), Statistical
databases (slides3)
Reading:
1.
Davide Alberto
Albertini, Barbara Carminati, and Elena Ferrari. 2017. An extended access
control mechanism exploiting data dependencies. Int. J. Inf. Secur. 16, 1 (February 2017), 75-89. , https://link.springer.com/article/10.1007/s10207-016-0322-4
2.
L.
Sweeney. Weaving Technology and Policy Together to Maintain Confidentiality.
Journal of Law, Medicine & Ethics, 25, nos. 2&3 (1997): 98-110. (http://onlinelibrary.wiley.com/doi/10.1111/j.1748-720X.1997.tb01885.x/epdf )
Febr. 12. Big Data Access
Control (slides)
Reading:
1.
L. Sweeney. Weaving Technology and Policy Together
to Maintain Confidentiality. Journal of Law, Medicine & Ethics, 25, nos.
2&3 (1997): 98-110. (http://onlinelibrary.wiley.com/doi/10.1111/j.1748-720X.1997.tb01885.x/epdf )
2.
Amin Beheshti, Boualem Benatallah, Reza Nouri, Van Munin
Chhieng, HuangTao Xiong, and Xu Zhao. 2017. CoreDB:
a Data Lake Service. In Proceedings of the 2017 ACM on Conference on
Information and Knowledge Management (CIKM '17). ACM, New York, NY, USA,
2451-2454. https://dl.acm.org/citation.cfm?id=3133171
3.
Mina Farid, Alexandra Roatis,
Ihab F. Ilyas, Hella-Franziska Hoffmann, and Xu Chu. 2016. CLAMS: Bringing
Quality to Data Lakes. In Proceedings of the 2016 International Conference on
Management of Data (SIGMOD '16). ACM, New York, NY, USA, 2089-2092., https://dl.acm.org/citation.cfm?id=2899391
Febr. 17. Data Provenance
(slides)
Reading:
1.
Peter Buneman and Wang-Chiew Tan. 2007.
Provenance in databases. In Proceedings of the 2007 ACM SIGMOD international
conference on Management of data (SIGMOD '07). ACM, New York, NY, USA,
1171-1173.
Interesting read (not required):
1.
Yael Amsterdamer, Susan B. Davidson, Daniel Deutch,
Tova Milo, Julia Stoyanovich, and Val Tannen. 2011.
Putting lipstick on pig: enabling database-style workflow provenance. Proc.
VLDB Endow. 5, 4 (December 2011), 346-357. https://dl.acm.org/citation.cfm?id=2095693
2.
Eleanor Ainy, Pierre Bourhis, Susan B.
Davidson, Daniel Deutch, and Tova Milo. 2015.
Approximated Summarization of Data Provenance. In Proceedings of the 24th ACM
International on Conference on Information and Knowledge Management (CIKM '15).
ACM, New York, NY, USA, 483-492. https://dl.acm.org/citation.cfm?id=2806429
Febr. 19 – 21. No Classes à Work on Project1
Febr. 26. Data Provenance
(slides) cont.
Reading:
1.
Melanie Herschel,
Ralf Diestelkämper, and Houssem
Ben Lahmar. 2017. A survey on provenance: What for?
What form? What from?. The VLDB Journal 26, 6
(December 2017), 881-906. https://dl.acm.org/citation.cfm?id=3159194
Interesting read (not required):
3.
Muhammad Naveed
Aman, Kee Chaing Chua, and Biplab
Sikdar. 2017. Secure Data Provenance for the Internet
of Things. In Proceedings of the 3rd ACM International Workshop on IoT Privacy,
Trust, and Security (IoTPTS '17). ACM, New York, NY,
USA, 11-14., https://dl.acm.org/citation.cfm?id=3055255
4.
Matteo Interlandi, Ari Ekmekji, Kshitij Shah, Muhammad Ali Gulzar, Sai Deep Tetali, Miryung Kim, Todd Millstein,
and Tyson Condie. 2018. Adding data provenance
support to Apache Spark. The VLDB Journal 27, 5 (October 2018), 595-615. https://dl.acm.org/citation.cfm?id=3283005
Febr. 28. XML and XML Security
(slides)
Reading:
1.
XML Primer, W3C, http://www.w3c.it/education/2012/upra/documents/xmlprimer.pdf
2.
Ernesto Damiani, Sabrina De Capitani
di Vimercati, Stefano Paraboschi,
and Pierangela Samarati.
2002. A fine-grained access control system for XML documents. ACM Trans. Inf.
Syst. Secur. 5, 2 (May 2002), 169-202., https://dl.acm.org/citation.cfm?id=505590
March 5. XML Database (slides)
Reading:
1.
XML Primer, W3C, http://www.w3c.it/education/2012/upra/documents/xmlprimer.pdf
2.
Ernesto Damiani, Sabrina De Capitani
di Vimercati, Stefano Paraboschi,
and Pierangela Samarati.
2002. A fine-grained access control system for XML documents. ACM Trans. Inf.
Syst. Secur. 5, 2 (May 2002), 169-202., https://dl.acm.org/citation.cfm?id=505590
March 7. XML
normalization (slides)
Reading:
1.
Cong Yu and H. V. Jagadish. 2008. XML schema
refinement through redundancy detection and normalization. The VLDB Journal 17,
2 (March 2008), 203-223. https://dl.acm.org/citation.cfm?id=1342417
2.
Millist W. Vincent, Jixue Liu, and Chengfei Liu.
2004. Strong functional dependencies and their application to normal forms in
XML. ACM Trans. Database Syst. 29, 3 (September 2004), 445-462. https://dl.acm.org/citation.cfm?id=1016029
3.
Serge Abiteboul, Georg Gottlob, and Marco Manna. 2009. Distributed XML design. In
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on
Principles of database systems (PODS '09). ACM, New York, NY, USA, 247-258. https://dl.acm.org/citation.cfm?id=1559833
March 12-14: Spring Break
March 19. XML Inferences (slides)
Reading:
1.
Andrei Stoica and Csilla Farkas. 2004. Ontology guided XML security engine.
J. Intell. Inf. Syst. 23, 3 (November 2004),
209-223., https://cse.sc.edu/~farkas/papers/journal13.pdf
March 21 . Streaming Data (slides) – Theppatorn Rhujittawiwat
[1]
26 Reading:
1.
Daniel J. Abadi, Don Carney, Ugur
Çetintemel, Mitch Cherniack,
Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. 2003.
Aurora: a new model and architecture for data stream management. The VLDB
Journal 12, 2 (August 2003), 120-139. https://dl.acm.org/citation.cfm?id=950485
2.
Barbara Carminati, Elena Ferrari, Jianneng Cao, and Kian Lee Tan. 2010. A framework to
enforce access control over data streams. ACM Trans. Inf. Syst. Secur. 13, 3, Article 28 (July 2010), 31 pages. https://dl.acm.org/citation.cfm?id=1805984
March 28. Cloud Databases (slides)
Microsoft Azure
(slides) – Josh Gregory [2]
Reading:
1.
Jun Tang, Yong Cui, Qi Li, Kui
Ren, Jiangchuan Liu, and Rajkumar Buyya.
2016. Ensuring Security and Privacy Preservation for Cloud Data Services. ACM Comput. Surv. 49, 1, Article 13
(June 2016), 39 pages. https://dl.acm.org/citation.cfm?id=2906153
2.
Microsoft Azure, https://azure.microsoft.com/en-us/get-started/
April 2. Data Analytics (slides)
Context Matters: How software
vulnerabilities impact data security (slides) – Kimberly Redmond [2]
Reading:
1.
Latifur Khan. 2018. Big
IoT Data Stream Analytics with Issues in Privacy and Security. In Proceedings
of the Fourth ACM International Workshop on Security and Privacy Analytics
(IWSPA '18). ACM, New York, NY, USA, 22-22., https://dl.acm.org/citation.cfm?id=3180455
2.
Kang, Boojoong, et al. "Malware
classification method via binary content comparison." Proceedings of the
2012 ACM Research in Applied Computation Symposium. ACM, 2012. https://dl-acm-org.pallas2.tcl.sc.edu/citation.cfm?id=2401672
Interesting:
3.
Zuo, Fei, et al. "Neural
machine translation inspired binary code similarity comparison beyond function
pairs." arXiv preprint arXiv:1808.04706 (2018).
https://arxiv.org/abs/1808.04706
4.
Seyed Mohammad Ghaffarian and
Hamid Reza Shahriari. 2017. Software Vulnerability
Analysis and Discovery Using Machine-Learning and Data-Mining Techniques: A
Survey. ACM Comput. Surv.
50, 4, Article 56 (August 2017), 36 pages. https://dl.acm.org/citation.cfm?id=3092566
April 4. Privacy Preserving Cloud
Computing (slides) – Fengyao Yan [1]
Privacy and
DM Applications (slides)
– Saivenkatanikhil Nimmagadda
[2]
Big Data
Credibility (slides) – Salhulding Alquarghuli
[3]
Reading:
1.
Xun Yi, Fang-Yu Rao, Elisa Bertino, and Athman Bouguettaya. 2015. Privacy-Preserving Association Rule
Mining in Cloud Computing. In Proceedings of the 10th ACM Symposium on
Information, Computer and Communications Security (ASIA CCS '15). ACM, New
York, NY, USA, 439-450. https://dl.acm.org/citation.cfm?id=2714603
2.
Ashwin Machanavajjhala,
Daniel Kifer, Johannes Gehrke,
and Muthuramakrishnan Venkitasubramaniam.
2007. L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl.
Discov. Data 1, 1, Article 3 (March 2007). https://dl.acm.org/citation.cfm?id=1217302
3.
Shi Zhi, Yicheng Sun, Jiayi Liu, Chao
Zhang, and Jiawei Han. 2017. ClaimVerif: A Real-time
Claim Verification System Using the Web and Fact Databases. In Proceedings of
the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17).
ACM, New York, NY, USA, 2555-2558. https://dl.acm.org/citation.cfm?id=3133182
April 9. Database Intrusion Detection (slides) – Matthew Heightland
[1]
Use of
Provenance for Intrusion Detection (slides)
– Rohit Naini [2]
Data
Analytics for Attack Detection (slides) – Xiya Xia [3]
Reading:
1.
Mohammad Saiful Islam, Mehmet Kuzu,
and Murat Kantarcioglu. 2015. A Dynamic Approach to
Detect Anomalous Queries on Relational Databases. In Proceedings of the 5th ACM
Conference on Data and Application Security and Privacy (CODASPY '15). ACM, New
York, NY, USA, 245-252. https://dl.acm.org/citation.cfm?id=2699120
2.
Ragib Hasan, Radu
Sion, and Marianne Winslett. 2009. Preventing history
forgery with secure provenance. Trans. Storage 5, 4, Article 12 (December
2009), 43 pages. https://dl.acm.org/citation.cfm?id=1629082
3.
Peng Gao, Xusheng Xiao, Zhichun Li, Kangkook Jee, Fengyuan Xu, Sanjeev R
Kulkarni, and Prateek Mittal. Aiql: enabling
efficient attack investigation from system monitoring data. In Proceedings of
the 2018 USENIX Conference on Usenix Annual Technical
Conference, pages 113–125. USENIX Association, 2018. https://www.usenix.org/system/files/conference/atc18/atc18-gao.pdf
April 11. Data security needs in high performance
computing HPC (slides)
[1]
Data
Provenance (slides) – Denise
Davis [2]
Identity
Management (slides) – Marc
Bowman [3]
Reading:
1.
Melanie Herschel,
Ralf Diestelkämper, and Houssem
Ben Lahmar. 2017. A survey on provenance: What for?
What form? What from?. The VLDB Journal 26, 6
(December 2017), 881-906. https://dl.acm.org/citation.cfm?id=3159194
2.
Matteo Interlandi, Ari Ekmekji, Kshitij Shah, Muhammad
Ali Gulzar, Sai Deep Tetali, Miryung
Kim, Todd Millstein, and Tyson Condie. 2018. Adding
data provenance support to Apache Spark. The VLDB Journal 27, 5 (October 2018),
595-615. https://dl.acm.org/citation.cfm?id=3283005
3.
Susmita Horrow and Anjali Sardana. 2012. Identity management
framework for cloud based internet of things. In Proceedings of the First
International Conference on Security of Internet of Things (SecurIT
'12). ACM, New York, NY, USA, 200-203. https://dl.acm.org/citation.cfm?id=2490456
April 16. Cloud applications (slides) – Harrison Howell [1]
Privacy and
Machine Learning (slides) – Nick
Rhodes [2]
Identity
Management (slides) – Marc Bowman [3]
Reading:
1.
Vlado Stankovski, Salman Taherizadeh, Ian Taylor, Andrew Jones, Bruce Becker, Carlo
Mastroianni, and Heru Suhartanto.
2015. Towards an environment supporting resilience, high-availability,
reproducibility and reliability for cloud applications. In Proceedings of the
8th International Conference on Utility and Cloud Computing (UCC '15). IEEE
Press, Piscataway, NJ, USA, 383-386. DOI: https://doi.org/10.1109/UCC.2015.61
2.
N. Papernot, P. McDaniel, A.
Sinha and M. P. Wellman, "SoK: Security and
Privacy in Machine Learning," 2018 IEEE European Symposium on Security and
Privacy (EuroS&P), London, 2018, pp. 399-414. http://www-personal.umich.edu/~arunesh/Files/Other/Papers/18-eurosp-adv-ml-sok.pdf
3.
Susmita Horrow and Anjali Sardana. 2012. Identity management
framework for cloud based internet of things. In Proceedings of the First
International Conference on Security of Internet of Things (SecurIT
'12). ACM, New York, NY, USA, 200-203. https://dl.acm.org/citation.cfm?id=2490456
April 18. Internet of Things and Access
Control (slides)
– Andrew Cox [1]
Threats to
Privacy in the Forensic Analysis of Database Systems (slides)
– Andrew Michels [2]
Data
Provenance (slides) – Denise
Davis [3]
Reading:
1.
Bertin, Emmanuel, et
al. "Access Control in the Internet of Things: a Survey of Existing
Approaches and Open Research Questions." Annals of Telecommunications,
2019. https://link.springer.com/article/10.1007/s12243-019-00709-7
2.
Stahlberg, Patrick, et al. “Threats to Privacy in
the Forensic Analysis of Database Systems.” Proceedings of the 2007 ACM SIGMOD
International Conference on Management of Data -SIGMOD '07, 2007. https://dl.acm.org/citation.cfm?id=1247492
3.
Matteo Interlandi, Ari Ekmekji, Kshitij Shah, Muhammad
Ali Gulzar, Sai Deep Tetali, Miryung
Kim, Todd Millstein, and Tyson Condie. 2018. Adding
data provenance support to Apache Spark. The VLDB Journal 27, 5 (October 2018),
595-615. https://dl.acm.org/citation.cfm?id=3283005
April 23. Anomalous Database Transaction
detection (slides) -- Harshith Reddy Sarabudla [1]
Medical data
privacy (slides1,
slides2)
– Jiexi Wang [2]
Reading:
1.
Syed Rafiul Hussain, Asmaa M. Sallam, Elisa Bertino, “DetAnom: Detecting
Anomalous Database Transactions by Insiders”. 5th ACM Conference on Data and
Application Security and Privacy 2015. https://dl.acm.org/citation.cfm?id=2699111
2.
M. Marwan, A. Kartit, and
H. Ouahmane. 2017. Design a Secure Framework for
Cloud-Based Medical Image Storage. In Proceedings of the 2nd international
Conference on Big Data, Cloud and Applications (BDCA'17). ACM, New York, NY,
USA, Article 7, 6 pages. https://dl.acm.org/citation.cfm?id=3090361
April 25. Review Lecture and Dark Data and
Cybersecurity (slides)
Reading:
1.
Ce Zhang, Jaeho Shin,
Christopher Ré, Michael Cafarella,
and Feng Niu. 2016. Extracting Databases from Dark
Data with DeepDive. In Proceedings of the 2016
International Conference on Management of Data (SIGMOD '16). ACM, New York, NY,
USA, 847-859. https://dl.acm.org/citation.cfm?id=2904442
May 7. 4:00 pm – 6:30 pm
Final Project
Presentations
1.
Salhuldin
2.
Bowman
3.
Cox
4.
Gregory
5.
Heightland
6.
Howell
7.
Michels
8.
Naini
9.
Nimmagada
10.
Redmond
11.
Theppatorn
12.
Sarabudla
13.
Wang
14.
Xia
15.
Yan
Logistics:
1.
Upload your FINAL Report by May 5, 23:55 pm. Note, all reports will be posted in dropbox for the rest of the class.
2.
Upload your 4 minutes presentation to Dropbox by May
6, 23:55 pm
On May 7, 2019
1.
I post the presentations on the class’ website
before the class
2.
Each student will have exactly 4 minutes to present
their work (4:00 pm -5:5:10 pm)
3.
We will use open voting for each project to rank
them (5:10 – 5:30 pm) – you can promote your project
4.
Discussion on projects and identifying future
possibilities (5:30 – 6:00 pm)
5.
Revisit ranking to select final top 3 (6:00 – 6:30
pm)