Summary
Overview
Work History
Education
Skills
Websites
Certification
Languages
Timeline
Generic

Tamaghna Banerjee

Charlotte,NC

Summary

Seasoned Data Engineer with more than eight years of end-to-end data platform experience, delivering high-performance, scalable solutions across cloud and on-premises environments. Expert in building streaming and batch pipelines with Python, PySpark, and SQL on Azure Databricks and Hadoop ecosystems, enabling single sources of truth and accelerating analytics for Fortune 500 retail and financial clients. Proven track record of slashing processing times and improving data reliability through intelligent job migration, automation, and Delta Lake governance with Unity Catalog. Adept at DevOps practices with Git and Jenkins, certified in Databricks and SAFe®, and recognized for leading cross-functional teams to deliver secure, well-governed data products that drive business value.

Overview

15
15
years of professional experience
1
1
Certification

Work History

Senior Data Engineer

CAPGEMINI
04.2023 - Current
  • Built SSOT Data Platform with streaming data pipelines for feature engineering and data analytics
  • Python, Spark, Azure Databricks, SQL, Azure, Git, Jenkins
  • Designed and implemented streaming data pipelines using PySpark for a major US based retail organization on Azure Databricks.
  • Created Databricks workspaces including deployment pipelines for production and non-prod environments.
  • Developed and deployed more than 120 Delta Tables on Azure Data Lake gen2.
  • Configured external metastore for registration of delta tables.
  • Implemented data governance with Unity Catalog.

Data Engineer

COGNIZANT
06.2020 - 03.2023
  • Data migration from relational data storage to Hadoop FS and Hive
  • Python, Spark, SQL, Hadoop, Hive, Oracle, Unix, Bash, Autosys, Git, Jenkins
  • Developed batch data ingestion pipelines from Relational DB and loaded parquet files in Data Lake – Hadoop Distributed File System (HDFS) using JDBC connection using Unix Shell scripts.
  • Designed and developed data transformations using PySpark for thecuration of data into Hive tables.
  • Automated rollback for failures using bash scripting. This should increase total batch execution efficiency by 20% by reducing manual intervention.
  • Migrated more than 80 jobs from Hadoop to Spark code, thereby reducing batch job SLA from more than 8 hours to 4.5 hours.
  • Batch data ingestion and transformation from Oracle to EDW using Informatica PowerCenter
  • Python, Pandas, SQL, Unix, Shell Script, Autosys, Informatica PowerCenter
  • Developed data ingestion pipelines to consume from various source systems like flat files and EDW into Oracle Database using Unix shell scripts and Informatica PowerCenter.
  • Developed data transformation logic using Informatica ETL - active and passive transformations.
  • Created database objects and data models for more than 50 tables and views.
  • Optimized file ingestion using XML data processing in Informatica, which resulted in a performance boost of 25%.

Software Engineer, Mainframe

COGNIZANT
11.2010 - 05.2020
  • OLAP data load into DB2 from flat files
  • SQL, IBM Mainframe, DB2, COBOL
  • Developed and deployed mainframe jobs to transform data from flat files to DB2 using COBOL.
  • Developed jobs using JCL and execute them as part of batch process.
  • Tested and maintained applications using Eztrieve and REXX.
  • Designed test automation frameworks and executed test cases.

Education

Bachelor of Technology - Electronics and Communication Engineering

West Bengal University of Technology
India
07.2010

Skills

  • Python, SQL, Bash
  • Spark, Hadoop, Hive, Oracle, DB2, MySQL, Azure Databricks, Apache Airflow
  • Azure Databricks, ADLS, Azure Data Factory, Azure Key Vault, Azure Synapse, AWS EC2, S3, GLUE
  • Git, Jenkins, Groovy, Yaml

Certification

  • Databricks : Partner Program - Solutions Architect Champion
  • MTA: Introduction to Programming Using Python
  • Certified SAFe 4 Scrum Master
  • SI Associate – Mongo DB

Languages

English
Full Professional

Timeline

Senior Data Engineer

CAPGEMINI
04.2023 - Current

Data Engineer

COGNIZANT
06.2020 - 03.2023

Software Engineer, Mainframe

COGNIZANT
11.2010 - 05.2020

Bachelor of Technology - Electronics and Communication Engineering

West Bengal University of Technology
Tamaghna Banerjee