[Remote] Staff Data Engineer
Note: The job is a remote job and is open to candidates in USA. Walmart is a leading retail company, and they are seeking a Staff Data Engineer. The role involves establishing and implementing data governance practices, building data infrastructure, and ensuring data quality and compliance while collaborating with various business stakeholders.
Responsibilities
- Establish, modify, and document data governance projects and recommendations
- Implement data governance practices in partnership with business stakeholders and peers
- Interpret company and regulatory policies on data
- Educate others on data governance processes, practices, policies, and guidelines
- Provide recommendations on needed updates or inputs into data governance policies, practices, or guidelines
- Translate/co-own business problems within one's discipline to data related or mathematical solutions
- Identify appropriate methods/tools to be leveraged to provide a solution for the problem
- Share use cases and give examples to demonstrate how the method would solve the business problem
- Understand the priority order of requirements and service level agreements
- Define and identify the most suitable sources for required data that is fit for purpose, referring to external sources as required
- Perform initial data quality checks on the extracted data
- Review the deliverables of junior associates and provide guidance on data source and quality
- Build the infrastructure required for optimal transformation and integration from a wide variety of data sources using appropriate data integration technologies
- Use modern tools, techniques, and architecture to partially or completely automate the most common, repeatable and tedious data preparation and integration tasks
- Deploy pipelines using scheduling and orchestration frameworks
- Evaluate impacts of data issues and risks at an early stage
- Identify needs and create methods to fuse and reshape complex, multi-source data and make it usable for modeling
- Update knowledge of current and emerging big data analytics and data science trends and techniques
- Build complex logical and conceptual models and provide guidance to team on physical data models
- Identify and define the appropriate techniques for exposing data to other systems
- Review and provide guidance and input on all data modeling activities to team members
- Create and maintain critical data documentation and metadata that allows data to be understood and leveraged as a shared asset
- Assist in defining data modeling standards and foundational best practices
- Provide inputs to the architectural design to make best use of the available resources, given goals, and expected loads
- Review the solution and application design to ensure it meets business, technical, and data requirements
- Identify language and libraries to use in the development process
- Map test cases to business and functional requirements
- Create proof of concepts
- Reviews and troubleshooting code in line with final designs
- Identify and recommend the appropriate testing methodology
- Identify the environment(s) for deployment
- Identify and recommend modifications of application based on different environment requirements
- Identify modifications needed for scalability and drive the change
- Monitor applications in production and leads development of patches where required
- Review and ensure all code documentation is complete and updated periodically
- Understand, articulate, interpret, and apply the principles of the defined strategy to unique, moderately complex business problems that may span one or more functions or domains
Skills
- Bachelor's degree or equivalent in Computer Science and 4 years of experience in software engineering, data engineering, database engineering, business intelligence, or business analytics or related field; OR Master's degree or equivalent in Computer Science and 2 years of experience in software engineering, data engineering, database engineering, business intelligence, or business analytics or related field
- Experience with designing and implementing Data Lakehouse and Data Lake architecture (Databricks, S3, and Glue)
- Experience with real-time Change Data Capture (CDC) ingestion frameworks (Kafka, Debezium, and Kinesis)
- Experience coding and debugging in an object-oriented programming language (Python)
- Experience with Big Data processing using Spark and PySpark (Spark Streaming, Spark SQL, and Python Pandas)
- Experience developing and optimizing data pipelines and reusable frameworks (Glue, ETL, and ELT)
- Experience with CI/CD practices (Jenkins, Harness, and GitHub Actions)
- Experience designing and querying Relational, NoSQL databases and Data Warehousing (Redshift, MySQL, MongoDB, Netezza, and DB2)
- Experience with cloud infrastructure and serverless computing for data solutions (AWS Services such as Lambda, CloudFormation, CloudWatch, and IAM)
- Experience developing analytical, AI, and ML data platforms in AWS for decision support
- Experience with Unix shell scripting for job orchestration, automation, and scheduling
- Experience with data modeling and schema design (dimensional modeling, normalization, and denormalization strategies)
- Experience with data security and governance (implementing data quality checks, metadata management, and compliance)
- Employer will accept any amount of experience with the required skills
Benefits
- Performance-based incentive awards
- Medical, vision and dental coverage
- 401(k)
- Stock purchase
- Company-paid life insurance
- PTO (including sick leave)
- Parental leave
- Family care leave
- Bereavement
- Jury duty
- Voting
- Short-term and long-term disability
- Education assistance with 100% company paid college degrees
- Company discounts
- Military service pay
- Adoption expense reimbursement
Company Overview